Rate Limiting

1 minute read

Can be in an application server or a DB server or any type od servers

Overloaded systems can

increase latency
OOM errors
results in cascading failures (when one fails, all requests are transferred to another overloading the other one)

Rate Limiting Algorithms

Sliding Window Rate Limiting

Allow only a certain number of requests in a time interval. Additional requests are dropped

Issues

If sliding window is of size N, keep N requests in the memory.

finding the requests to evict needs N searches each time

The timer wheel allots requests to slots based on the time of their arrival.

Size of the wheel (Number of buckets) = Timeout of the incoming request
Bucket is numbered 0 to Timeout -1
Each bucket can store a limited number of requests (In a Linked List or a Queue)
Request is added into the bucket number = Time % Number of buckets/Size of Wheel, Modulo operator keeps it revolving
- System can pull requests from this queue one by one.
Before inserting a new request into a bucket it deletes all the existing requests (These requests have not been processed by the system)

Ways to identify the request overloading within the system

Average response time : if its increasing, then the system is overloaded
Age of meggases in the wait Queue : If the Queue is slowly processed, the average age of the request in the queue will be increasing
Increased number of requests in the Dead Letter Queue