Remember those college days discussions on why one should upgrade to a 1.2 GHz processor when most of the time CPU is idle? Well, it was not an average case problem, it was for the times when system is loaded and that extra bit of processing can help you complete your task.

In Cloud world, the strategy we employ is to auto-scale, i.e. to provision extra resources when there is need. This is how shopping sites cater seasonal peak workloads. However, provisioning request may not be immediate. It may take time and if demand grows rapidly and  provisioning requirement of resource exceeds what is available on the system, this requirement may fail and in-turn the system will fail to meet its SLA. This would be unacceptable.

There may be multiple applications, tenants or services running on shared resources. Heavy demand from one may impact SLAs of others.

A solution to this problwm is to Throttle requests when it exceeds a certain threshold. This increases system tolerance and enables continued functionaning and SLAs. System could

  • Meter the use of resources and put a cap on requests-per-minute for each user
  • Defer/Suspend operations for lower priority applications informing them that system is busy and operation should be retried later
  • Degrade functionality of non-essential services to let other services deliver
  • Load levelling to smooth volume of activity through an intermediate buffering

An effective system would use a combination of Auto-Scaling and Throttling. While throtttling provides intermediate solution to contain the load, auto-scale can scale-out in that time.

However it is important that system should be able to un-throttling when system has scaled-out to restore full functionality and revert when load has eased.

Applications need to be aware of Throttling and appropriate error codes should be returned so applications can become more resilient to this temporary state. Throttling must be quick to switch-in and switch-out.

It is also possible that the surge is short lived and thus an auto-scale system may not kick in.

Throttling is thus important when, we want to prevent one tenant monopolizing the resource and we want to handle a burst in activity. Throttling can be used as an interim measure while system auto-scale. It can also be used to control when a system should scale.

Throttling helps in optimizing tanents in a multi-tenant app.