Load Balancer¶

Interview Time: 60-90 min | Difficulty: Hard
Key Focus: Request routing, health checks, session affinity, distribution algorithms

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Distribute incoming traffic across multiple backend servers
Support multiple load balancing algorithms (round-robin, least connections, weighted)
Health checking: detect and remove failed servers
Session persistence (sticky sessions) for stateful applications
SSL/TLS termination (decrypt at load balancer)
Support multiple protocols (HTTP, HTTPS, TCP, UDP)
Geographic routing (route to nearest data center)
Rate limiting and DDoS protection
Connection pooling and keep-alive support
Request logging and metrics collection

Non-Functional Requirements¶

Requirement	Target	Notes
Throughput	1M requests/sec	Single load balancer
Latency	Add <5ms overhead	Network routing + health checks
Availability	99.99% uptime	Active-passive failover
Scalability	10,000+ backend servers	Consistent hashing for distribution
Failover	<1 second detection + redirect	Minimal request loss on failure
Connection limits	1M concurrent connections per LB	Connection pooling
Geographic	Route within 50ms of user	CDN + DNS-based geo-routing

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints (Load Balancer Control Plane)¶

POST /backends
  {id: "web-1", host: "10.0.1.1", port: 8080, weight: 1, health_check_path: "/health"}
  → {id, status: REGISTERED}

GET /backends
  → {backends: [{id, host, port, weight, status: HEALTHY|UNHEALTHY}]}

DELETE /backends/{id}
  → {status: DEREGISTERED}

POST /routing-rules
  {path_prefix: "/api", algorithm: "least_connections", backends: ["web-1", "web-2"]}
  → {rule_id}

GET /health
  → {load_balancer_id, active_connections, requests_per_sec, backend_health: {...}}

Entity Data Model¶

BACKENDS
├─ backend_id (PK)
├─ host, port
├─ weight (affects distribution)
├─ status (HEALTHY|UNHEALTHY|DRAINING)
├─ last_health_check_at
├─ consecutive_failures
├─ max_consecutive_failures (to mark unhealthy)

ROUTING_RULES
├─ rule_id (PK)
├─ path_prefix, host_header_match
├─ algorithm (ROUND_ROBIN|LEAST_CONN|IP_HASH|WEIGHTED)
├─ backend_ids (list of available backends)
├─ session_sticky: bool

ACTIVE_CONNECTIONS
├─ connection_id (PK)
├─ client_ip, client_port
├─ backend_id, backend_connection_id
├─ session_id (if sticky)
├─ created_at, last_activity_at
├─ data_sent_bytes, data_received_bytes

METRICS
├─ timestamp_ms, backend_id
├─ requests_count, error_count
├─ avg_response_time_ms, p99_latency_ms
├─ active_connections, bytes_in, bytes_out

High-Level Architecture¶

graph TB
    Client["Client"]
    InternetGW["Internet Gateway<br/>(VPC boundary)"]

    LB["Load Balancer<br/>(Active)"]
    LB_Standby["Load Balancer<br/>(Standby)<br/>Hot failover"]

    HealthChecker["Health Checker<br/>(async)"]
    ConnectionRouter["Connection Router<br/>(consistent hash)"]
    RateLimiter["Rate Limiter<br/>(token bucket)"]

    Backend1["Backend 1<br/>(healthy)"]
    Backend2["Backend 2<br/>(healthy)"]
    Backend3["Backend 3<br/>(failed)"]

    MetricsCollector["Metrics Collector<br/>(Prometheus)"]
    ControlPlane["Control Plane<br/>(API)"]

    ConnectionPool["Connection Pool<br/>(TCP keep-alive)"]

    Client --> InternetGW
    InternetGW --> LB

    LB --> RateLimiter
    RateLimiter --> ConnectionRouter
    ConnectionRouter --> Backend1
    ConnectionRouter --> Backend2
    ConnectionRouter -.->|skipped| Backend3

    LB --> ConnectionPool
    ConnectionPool --> Backend1

    LB -.->|heartbeat| HealthChecker
    HealthChecker -->|health_check| Backend1
    HealthChecker -->|health_check| Backend3
    HealthChecker -->|update_status| LB

    LB --> MetricsCollector
    ControlPlane -->|configure| LB

    LB -.->|sync_state| LB_Standby

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Uneven Load Distribution with Dynamic Backends¶

Scenario: Backend servers added/removed, or weights changed. Simple round-robin distributes unevenly. If server A has weight 3 and server B has weight 1, but we alternate, B gets overloaded.

Solutions:

Approach	Implementation	Pros	Cons
Simple Round-Robin	Cycle through backends	Easy, fair if static	No weight support, ignore capacity
Weighted Round-Robin	Allocate slots by weight	Fair distribution	Complex scheduling on weight changes
Least Connections	Route to server with fewest active conns	Adaptive to load	O(N) lookup, complex with weights
Consistent Hashing	Hash client IP → server	Preserves sessions, stable with changes	Hotspot risk if hash function bad

Recommended: Least Connections (weighted) for general use; Consistent Hashing for session persistence

class LoadBalancer:
    def select_backend_least_conn(self, backends, weights):
        """Select backend with fewest active connections, weighted"""

        weighted_candidates = [
            (backend, backend.active_connections / (weights[backend.id] + 1))
            for backend in backends if backend.status == HEALTHY
        ]

        if not weighted_candidates:
            raise NoHealthyBackendError()

        # Select backend with lowest weighted connection count
        selected = min(weighted_candidates, key=lambda x: x[1])
        return selected[0]

    def select_backend_consistent_hash(self, client_ip, backends):
        """Consistent hashing for session stickiness"""

        # Build hash ring
        ring_size = 360  # degrees in circle
        hash_ring = {}

        for backend in backends:
            if backend.status != HEALTHY:
                continue

            token = hash(backend.id) % ring_size
            hash_ring[token] = backend

        # Hash client IP
        client_hash = hash(client_ip) % ring_size

        # Find next backend in ring (clockwise)
        sorted_tokens = sorted(hash_ring.keys())
        for token in sorted_tokens:
            if token >= client_hash:
                return hash_ring[token]

        # Wrap around
        return hash_ring[sorted_tokens[0]]

Example Traffic Distribution (Weighted Round-Robin):

Backends: [A (weight=3), B (weight=1), C (weight=2)]

Distribution cycle: [A, B, A, C, A, C]  (6 requests: A gets 3, C gets 2, B gets 1)

If server removed (say C unavailable):
Adjust: [A, B, A, A, B, A]  (A: 4/6, B: 2/6 ≈ 3/1 ratio preserved)

🟡 Problem: Connection Draining & Graceful Shutdown¶

Scenario: Operator wants to remove backend server (maintenance/upgrade). But there are 10,000 active connections. If we drop them, clients see errors.

Solution: Graceful drain: mark server as DRAINING, stop accepting new connections, let existing complete

class BackendManager:
    def drain_backend(self, backend_id: str):
        """Gracefully drain backend before removal"""

        backend = self.get_backend(backend_id)
        backend.status = DRAINING  # Stop accepting new connections

        logging.info(f"Draining {backend_id}, active conns: {backend.active_connections}")

        # Wait for existing connections to close
        max_wait_sec = 300  # 5 minute timeout
        start_time = time.time()

        while backend.active_connections > 0:
            elapsed = time.time() - start_time
            if elapsed > max_wait_sec:
                logging.warning(f"Drain timeout after {max_wait_sec}s, force-closing {backend.active_connections} connections")
                self.force_close_connections(backend_id)
                break

            # Wait, then check again
            time.sleep(5)
            logging.info(f"Draining {backend_id}, remaining conns: {backend.active_connections}")

        # Remove from load balancer
        self.deregister_backend(backend_id)
        logging.info(f"Backend {backend_id} fully drained")

    def handle_client_request(self, client_ip):
        """Don't route new requests to DRAINING backends"""

        healthy_backends = [
            b for b in self.backends 
            if b.status in [HEALTHY, DRAINING]
        ]

        # Use only HEALTHY for new connections
        healthy_only = [b for b in healthy_backends if b.status == HEALTHY]
        if healthy_only:
            return self.select_backend(healthy_only)

        # Fallback to DRAINING if no HEALTHY (rare case)
        return self.select_backend(healthy_backends)

💾 Problem: Detecting Backend Failures Quickly¶

Scenario: Backend server crashes. LB should detect within 1-2 seconds and stop routing traffic. Health checks are expensive (run every 5 seconds).

Solution: Connection timeout detection + periodic health checks

class HealthChecker:
    def __init__(self, check_interval_sec=10, failure_threshold=3):
        self.check_interval_sec = check_interval_sec
        self.failure_threshold = failure_threshold  # Mark unhealthy after 3 consecutive failures

    def health_check_backend(self, backend: Backend):
        """Async health check to backend health endpoint"""

        try:
            response = http_get(
                f"http://{backend.host}:{backend.port}{backend.health_check_path}",
                timeout=5
            )

            if response.status_code == 200:
                backend.consecutive_failures = 0
                backend.status = HEALTHY
                backend.last_successful_check_at = now()
            else:
                backend.consecutive_failures += 1
                if backend.consecutive_failures >= self.failure_threshold:
                    backend.status = UNHEALTHY

        except TimeoutError:
            backend.consecutive_failures += 1
            if backend.consecutive_failures >= self.failure_threshold:
                backend.status = UNHEALTHY

    def run_health_checks(self):
        """Run async health checks on all backends"""

        while True:
            for backend in self.backends:
                # Run check in thread pool (don't block)
                self.thread_pool.submit(self.health_check_backend, backend)

            time.sleep(self.check_interval_sec)

    def detect_connection_timeout(self, backend_id: str):
        """Fast detection: connection timeouts mark backend unhealthy immediately"""

        backend = self.get_backend(backend_id)
        if backend.status == HEALTHY:
            logging.warning(f"Connection timeout to {backend_id}, marking UNHEALTHY")
            backend.status = UNHEALTHY
            backend.consecutive_failures = self.failure_threshold

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design (In-Memory State + Persistent Log)¶

-- Backend configuration (persistent)
CREATE TABLE backends (
    backend_id TEXT PRIMARY KEY,
    host TEXT,
    port INT,
    weight INT,
    health_check_path TEXT,
    max_connections INT,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

-- Routing rules configuration
CREATE TABLE routing_rules (
    rule_id TEXT PRIMARY KEY,
    path_prefix TEXT,
    algorithm TEXT,
    backend_ids LIST<TEXT>,
    session_sticky BOOLEAN,
    created_at TIMESTAMP
);

-- Active connections log (for debugging)
CREATE TABLE active_connections (
    connection_id TEXT PRIMARY KEY,
    client_ip TEXT,
    backend_id TEXT,
    created_at TIMESTAMP,
    last_activity_at TIMESTAMP,
    bytes_in BIGINT,
    bytes_out BIGINT
) WITH TTL 604800;  -- 7 days

-- Connection metrics (time-series)
CREATE TABLE lb_metrics (
    timestamp_ms BIGINT,
    backend_id TEXT,
    requests_count INT,
    error_count INT,
    avg_latency_ms INT,
    p99_latency_ms INT,
    active_connections INT,
    PRIMARY KEY (timestamp_ms, backend_id)
) WITH COMPRESSION = 'ZstdCompressor';

CREATE INDEX idx_active_connections_backend ON active_connections(backend_id);
CREATE INDEX idx_metrics_backend_time ON lb_metrics(backend_id, timestamp_ms DESC);

Caching Strategy¶

TIER 1: LB In-Memory State
├─ [Backend ID] → {host, port, weight, status, active_conns}  (refresh on health check)
├─ Routing rules → [backends]  (cached, updated on rule change)
└─ Connection sessions → {client_ip → backend_id}  (TTL: connection lifetime)

TIER 2: Connection Pooling
├─ Keep-Alive HTTP connections to backends (reuse TCP)
├─ Pool size: min 10, max 1000 per backend
└─ Idle timeout: 60 seconds

Invalidation:
- Backend added/removed → update in-memory state synchronously
- Backend status change → update active routing immediately
- Health check result → update backend.status

Monitoring & Alerts¶

Key Metrics:

Request Distribution — Requests per backend (should be proportional to weight)
Backend Health — Unhealthy count, detection latency
Connection Pool — Active connections, timeout rate
Latency — p50/p99 response time, difference between backends
Error Rate — 4xx/5xx rate per backend, total error rate

- alert: UnhealthyBackends
  expr: count(backend_status == UNHEALTHY) > 0
  for: 1m
  annotations: "{{$value}} backends unhealthy, capacity reduced"

- alert: BackendImbalance
  expr: max(backend_request_rate) / min(backend_request_rate) > 2
  for: 5m
  annotations: "Uneven load distribution ({{$value}}x difference)"

- alert: HighErrorRate
  expr: rate(http_error_total[5m]) / rate(http_requests_total[5m]) > 0.01
  annotations: "Error rate >1%, investigate backends"

- alert: ConnectionPoolExhausted
  expr: active_connections / max_connections > 0.90
  annotations: "Connection pool >90% full ({{$value | humanizePercentage}})"

- alert: LoadBalancerFailover
  expr: active_load_balancer != prev(active_load_balancer)
  for: 1s
  annotations: "Load balancer failover detected"

- alert: HealthCheckLatency
  expr: health_check_duration_ms > 1000
  annotations: "Health checks slow ({{$value}}ms), check network"

Grafana Dashboard Metrics:

Requests per second (by backend):
  rate(http_requests_total[1m]) group by (backend_id)

Latency distribution:
  histogram_quantile([0.50, 0.95, 0.99], rate(http_request_latency_bucket[5m]))

Backend health status:
  backend_status{status="HEALTHY|UNHEALTHY"}

Active connections:
  active_connections by (backend_id)

Bytes transferred:
  rate(bytes_in_total[1m]), rate(bytes_out_total[1m])

⚡ Quick Reference Cheat Sheet¶

When to Use What¶

Need	Technology	Why
Even distribution	Least Connections (weighted)	Adapts to varying server capacity
Session stickiness	Consistent Hashing	Client always routes to same backend
Fast failure detection	Connection timeout + periodic health checks	Detects crashes within 1-2 sec
Graceful shutdown	Status DRAINING + wait for connections	Zero request loss
SSL termination	Hardware LB or dedicated TLS proxy	Offloads crypto work from backends
Connection pooling	TCP keep-alive + connection reuse	Reduces latency, improves throughput

Critical Design Decisions¶

Least Connections (weighted): Better than round-robin for variable server capacity
Active-passive failover: Standby LB takes over within <1 sec if primary fails
Per-backend rate limiting: Prevent one slow backend from dragging down others
Graceful drain: Mark DRAINING before removal, wait for connections to close
Periodic health checks + timeout detection: Hybrid approach for quick failure detection
Connection pooling: Reuse TCP connections to backends (lower latency, fewer sockets)

Tech Stack Summary¶

Load Balancer: HAProxy, Nginx, F5, Netflix Eureka Gateway
Health Checks: Custom HTTP endpoint /health
Metrics: Prometheus + Grafana
Failover: Active-passive with keepalived/heartbeat
Session Persistence: Consistent hashing or cookie-based

🎯 Interview Summary (5 Minutes)¶

Distribute traffic: Route to backend with fewest active connections (weighted by capacity)
Health checks: Async periodic checks + immediate timeout detection to identify failures (<2 sec)
Session persistence: Use consistent hashing (hash client IP → backend) if needed
Graceful drain: Mark server DRAINING before removal, wait for active connections to close
Connection pooling: Reuse TCP connections to backends with keep-alive (lower latency)
Failover: Active-passive LB pair, standby takes over within <1 sec if primary fails
Rate limiting: Per-backend limits prevent slow backends from dragging down fast ones