Jubin Soni - Portfolio & Blog

In modern distributed architectures, the "noisy neighbor" problem is a constant threat to system stability. Whether it is a malicious DDoS attack or a misconfigured internal service making recursive calls, an unprotected API is a liability. Companies like Stripe and Netflix handle billions of requests daily; for them, rate limiting is not just a security feature—it is a core component of their availability strategy.

A distributed rate limiter must solve a fundamental challenge: how to maintain a global state of request counts across hundreds of nodes without introducing significant latency. If every request requires a synchronous check against a central database, the rate limiter becomes the bottleneck. Conversely, if each node tracks limits independently, a client could bypass the limit by cycling through different edge nodes.

This post explores the design of a production-grade distributed rate limiter integrated within an API Gateway. We will examine the trade-offs between consistency and latency, the selection of algorithms, and how to scale the system to handle millions of concurrent users.

Requirements

To build a robust system, we must distinguish between what the system does and how it performs. In a distributed environment, non-functional requirements often dictate the architectural choices more than the functional ones.

Capacity Estimation

For a mid-to-large scale deployment, we estimate the following requirements:

Metric	Value
Daily Active Users (DAU)	10 Million
Average Requests per User/Day	50
Total Requests per Day	500 Million
Average RPS	~5,800
Peak RPS (3x Average)	~17,400
Storage per User (Redis)	~64 bytes
Total Memory Required	~640 MB

High-Level Architecture

The system follows a sidecar or middleware pattern within the API Gateway. When a request enters the load balancer, it is routed to an API Gateway instance. The gateway consults a distributed cache (Redis) to determine if the request should be throttled.

Detailed Design

We will implement the Sliding Window Counter algorithm. This approach provides better accuracy than the Fixed Window (which suffers from bursts at window boundaries) and uses less memory than the Sliding Window Log (which stores every timestamp).

To avoid race conditions and minimize round-trips to Redis, we use Lua scripts. This ensures the "check-and-increment" logic happens atomically on the Redis server.

Core Implementation (Go)

import (
	"context"
	"github.com/go-redis/redis/v8"
	"time"
)

const RateLimitLua = `
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current_time = tonumber(ARGV[3])

local window_start = current_time - window
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)

local current_count = redis.call('ZCARD', key)
if current_count < limit then
    redis.call('ZADD', key, current_time, current_time)
    redis.call('EXPIRE', key, window)
    return 0 -- Success
else
    return 1 -- Throttled
end
`

type RateLimiter struct {
	client *redis.Client
}

func (rl *RateLimiter) Allow(ctx context.Context, userID string, limit int, window time.Duration) (bool, error) {
	now := time.Now().UnixNano() / int64(time.Millisecond)
	windowMs := int64(window / time.Millisecond)
	
	res, err := rl.client.Eval(ctx, RateLimitLua, []string{userID}, limit, windowMs, now).Result()
	if err != nil {
		return false, err
	}
	
	// 0 means allowed, 1 means throttled
	return res.(int64) == 0, nil
}

This implementation uses a Redis Sorted Set (ZSET). While ZSET is more memory-intensive than a simple counter, it provides perfect accuracy for sliding windows. For massive scale, one might switch to a hybrid approach: using INCR with two counters (current and previous window) to approximate the sliding window.

Database Schema

The persistent storage (PostgreSQL) manages the rate-limiting policies, while Redis handles the transient counters. Policies are cached locally in the API Gateway to avoid hitting PostgreSQL on every request.

sql

CREATE TABLE rate_limit_policies (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    resource_path VARCHAR(255) NOT NULL,
    request_limit INTEGER NOT NULL,
    window_seconds INTEGER NOT NULL,
    tier VARCHAR(50) DEFAULT 'FREE',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_policy_path_tier ON rate_limit_policies(resource_path, tier);

Scaling Strategy

Scaling from 1,000 to 1,000,000+ users requires moving from a single Redis instance to a distributed Redis Cluster. However, the biggest bottleneck is often the network latency between the Gateway and Redis. To solve this, we implement L1/L2 Caching.

L1 (Local Memory): Each Gateway node maintains a small, short-lived cache of "known throttled users." If a user is already way over their limit, we block them at the edge without checking Redis.
L2 (Redis Cluster): The global source of truth.

Failure Modes and Resiliency

In a distributed system, the rate limiter must not become a Single Point of Failure (SPOF). If the Redis cluster experiences a partition or high latency, the system should "fail open." It is better to allow a few extra requests through than to block all legitimate traffic because the rate limiter is down.

We implement a Circuit Breaker around the Redis call. If the failure rate exceeds a threshold, the breaker opens, and the Gateway allows all traffic for a cooling-off period while alerting the SRE team.

Conclusion

Designing a distributed rate limiter involves a delicate balance between accuracy and performance. While a perfectly synchronized global counter is ideal for strict enforcement, the CAP theorem reminds us that in the face of network partitions, we must choose between consistency and availability.

For most production environments, an AP (Available and Partition-tolerant) approach is preferred. By utilizing Redis for global state, Lua scripts for atomicity, and local L1 caches for high-volume offenders, we can build a system that scales linearly. The key takeaway for any staff engineer is that the rate limiter should be "invisible" to legitimate users while remaining a robust shield against system abuse.

Designing a Distributed Rate Limiter (API Gateway)