On a ticket counter, what happens when several people try to buy tickets and there is no order? CHAOS! We have seen this at movie theaters on first-day-first-shows, at railway reservations in festive seasons, flight counters when flights are cancelled and refund counters.


The person who is serving at the counter has not changed, nor that the person has become inefficient. The person is just overwhelmed with so many requests and these out-of-order requests, seeking his immediate attention are responsible to introduce faults rendering an overall system failure.

So what is the solution?

An ordered Queue

An ordered Queue! You make people follow a queue where everyone is guaranteed to be served, just that the time to get to be served increases with queue length.

But then, long queues can be frustrating and people might get demotivated to wait for service delivery. An interesting solution to this problem can be found in places like Banks, Service stations, Ticket booking counters that use tokens for dynamic queue management.


Have you seen something similar? When you walk in to a Service station, you are immediately provided a Token. With that token, you don’t have to stand in Queues but wait somewhere in comfort, free to do other things, and as your turn comes by, you are notified of the counter where you should go to get your work done.

This is exactly what a Load Levelling Queue pattern is. It helps balance load on a service endpoint when there is a peak in demand. Requests from applications or other services are queued to a message queue at whatever rate these are received. Consumer service processes these messages from the queue head at a more consistent rate.


Another advantage of this queue is that even if the consumer service goes down, post messages to this queue still works thus there are no direct impact on the producer when consumer is delayed or down.

You can scale the number of queues and number of consumers to meet demand. The number of consumer service instance also gets cost balanced because you need to deploy service instances to meet average load and not the peak load.

Since message queues are one-way service, if producer expects a response, you need to implement mechanism for asynchronous message response.