Web Cache¶

Interview Time: 60 min | Difficulty: Medium
Key Focus: Cache invalidation, LRU eviction, consistency, multi-level caching

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Cache frequently accessed data (data, API responses)
Return cached data if available (cache hit), fetch from origin if miss
Eviction policies: LRU (least recently used), LFU (least frequently used), TTL
Cache invalidation: explicit purge, TTL expiration, conditional updates
Support for different data types (objects, strings, lists, sets)
Enable/disable caching per data item
Performance analytics (hit ratio, eviction stats)
Multi-tier caching (L1: in-process, L2: service cache, L3: distributed)

Non-Functional Requirements¶

Requirement	Target	Notes
Latency	<5ms cache hit	Memory-based, <1ms ideal
Hit ratio	80%+ for hot data	Cold data has lower hit ratio
Availability	99.9% (non-critical)	Cache loss = re-fetch, no harm
Throughput	100K ops/sec	Single instance, distributed via sharding
Memory usage	Bounded by max_memory	Evict least valuable when full
Consistency	Eventual (cache miss OK)	Stale data acceptable for short TTL

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints/Operations¶

GET /cache/{key}
  → {value, hit_count, last_accessed, ttl_remaining_seconds}
  → If miss: fetch from database, store in cache, return
  → If hit: update access time, return value

SET /cache/{key}
  {value, ttl_seconds: 3600, eviction_policy: LRU}
  → {success: true, stored_size_bytes, eviction_count}

DELETE /cache/{key}
  → {success: true, freed_size_bytes}

FLUSH /cache
  → {cleared_count, freed_size_bytes}

GET /cache/stats
  → {hits, misses, hit_ratio, eviction_count, memory_used, memory_max}

INVALIDATE /cache
  {keys: [key1, key2, key3]}
  → {invalidated_count, eviction_count}

Entity Data Model¶

CACHE_ENTRY
├─ key (PK, string - "user:123", "post:456")
├─ value (binary, up to max_value_size)
├─ size_bytes (for memory management)
├─ created_at (timestamp)
├─ last_accessed_at (for LRU)
├─ access_count (for LFU)
├─ ttl_seconds (0 = no expiry)
├─ expires_at (created_at + ttl_seconds)
├─ priority (for weighted eviction)
├─ flags (pinned=true for never-evict)

CACHE_STATS (aggregated per shard/instance)
├─ timestamp
├─ hits_count
├─ misses_count
├─ evictions_count
├─ memory_used_bytes
├─ memory_max_bytes
├─ entries_count

High-Level Architecture¶

graph TB
    Client["👤 Client<br/>(Web Browser)"]
    LB["Load Balancer"]

    APP_TIER["Application Tier<br/>(compute service)"]

    L1_CACHE["L1 Cache<br/>(In-Process)<br/>1MB per instance"]

    L2_REDIS["L2 Cache<br/>(Redis Cluster)<br/>10GB per shard<br/>12 shards total"]

    L3_DB["L3 Storage<br/>(Database)"]

    CACHE_INVALIDATOR["Cache Invalidator<br/>(async events)"]

    Client -->|GET /user/123| LB
    LB -->|route| APP_TIER

    APP_TIER -->|check L1| L1_CACHE

    L1_CACHE -->|miss| L2_REDIS
    L2_REDIS -->|miss| L3_DB

    L3_DB -->|load + cache| L2_REDIS
    L2_REDIS -->|load + cache| L1_CACHE
    L1_CACHE -->|return| APP_TIER

    CACHE_INVALIDATOR -->|invalidate| L1_CACHE
    CACHE_INVALIDATOR -->|invalidate| L2_REDIS

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Cache Stampede (Thundering Herd)¶

Scenario: Popular item cached. Popular data expires. 1000 requests hit simultaneously. All 1000 go to database to refresh cache. Database overloaded.

Solution: Probabilistic Early Expiration

Cache entry: {key: "trending_items", ttl: 3600, created: 14:00}

Normal expiration:
  Request at 15:00: Cache expired, fetch from DB

Stampede-prone scenario (popular item):
  Requests: 1000/sec
  At 15:00:00 → Cache expires
  Requests 15:00:00 to 15:00:10 → All 1000 miss
  Database hit: 10,000 simultaneous queries!
  → DB overload

Solution: Probabilistic early refresh

Cache entry:
  ttl = 3600
  early_refresh_start = ttl * 0.8 = 2880 (80% of TTL)
  early_refresh_end = ttl (100% of TTL)

At time 14:40 (80% of TTL):
  Request arrives
  Is item within early refresh window?  YES

  Probability P = (time_since_window_start) / (window_duration)
  P = (14:40 - 14:48) / (3600 * 0.2)
    = 480 / 720 = 66%

  random.rand() < 0.66:  YES → Refresh (refresh 66% of requests)
                         NO  → Use stale (serve stale cache)

Acquire lock:
  SET refresh_lock:{key} {thread_id, timestamp} NX EX 10 sec

  If acquired:
    Fetch fresh from DB
    Update cache TTL
    Release lock

  If not acquired (another thread refreshing):
    Serve stale cache
    Retry lock in 100ms
    Once lock released, use fresh

Result:
  - Only 1 DB call per expiration
  - Others serve stale (fast)
  - No thundering herd

🟡 Problem: Inconsistence (Cache vs Database)¶

Scenario: Update database. Old cache still returns. User sees stale data for hours.

Solutions:

1. WRITE-THROUGH (update cache + database atomically)

  Operation: update_item(id, new_value)

  Transaction:
    BEGIN
    1. Update database
       UPDATE items SET value = ? WHERE id = ?

    2. Update cache (same transaction)
       SET cache[key] = new_value
       TTL = 3600

    3. Log invalidation event
       PUBLISH cache_invalidation {operation: UPDATE, key, value}

    COMMIT

  Result:
    - Database and cache always consistent
    - If transaction fails, both updated or both not
    - Latency: DB write time + cache write time
    - Throughput: Limited by DB write latency

2. WRITE-BEHIND (write to cache first, journal to database)

  Operation: update_item(id, new_value)

  Synchronous:
    1. Write to cache (fast, <1ms)
       SET cache[key] = new_value
       → Client gets response immediately

  Asynchronous:
    2. Background: Write to database
       Every 1 second or 1000 updates:
         Submit batch to database
         If database write fails:
           Log error and retry in 10 sec

  Pros:
    - Cache hits always latest (written there first)
    - Fast response to client

  Cons:
    - If cache crashes before DB write, data loss
    - Eventual consistency (DB may lag cache by seconds)

  Mitigate:
    - Use persistent cache (Redis with AOF/RDB)
    - Small batch window (100ms)
    - Duplicate cache (replicated, failover)

3. CACHE-ASIDE (write directly to database, invalidate cache)

  Operation: update_item(id, new_value)

  1. Update database directly
     UPDATE items SET value = ? WHERE id = ?

  2. Invalidate cache
     DEL cache[key]

  3. Next read:
     GET cache[key] → miss
     GET database[key] → fetch fresh
     SET cache[key] → store for future

  Pros:
    - Simple, decoupled from database
    - Database write is primary source of truth
    - Cache can crash, just slower reads

  Cons:
    - Between DELETE and next GET: cache miss (slower)
    - Complexity: need to coordinate invalidation

Solution: Sharding for Scalability¶

Problem: Single Redis instance = single bottleneck
Solution: Partition data across multiple shards

Hash Ring:
  Shard 0: keys 0-2^32/12
  Shard 1: keys 2^32/12 to 2*2^32/12
  ...
  Shard 11: keys 11*2^32/12 to 2^32

For key "user:123":
  hash("user:123") % 12 = shard_id

  All operations on "user:123" go to same shard
  → Consistency within key
  → Parallelism across shards

Throughput scaling:
  Single shard: 100K ops/sec
  12 shards: 1.2M ops/sec!

Replicas for fault-tolerance:
  Shard 0 (primary): Leader
  Shard 0 (replica): Follower (reads, auto-promotes if primary fails)

  Read: Primary or replica (even distributed)
  Write: Primary only

Step 4: Persistence Layer, Caching & Monitoring¶

In-Memory Storage (Hash Map)¶

Typical implementation (Redis):

Data structure: Hash table (for O(1) lookup)
  bucket_0 → [entry1, entry2]
  bucket_1 → [entry3]
  bucket_2 → [entry4, entry5, entry6]
  ...

For each entry:
  Key:   string (usually <256 bytes)
  Value: serialized (JSON/binary)
  Size:  bytes (for memory accounting)

Memory management:

Total cache memory = max_memory (e.g., 10GB)
Used memory = sum(value.size + key.size + metadata)

Eviction when used > max_memory:

1. Remove expired entries
2. Remove entries marked for deletion
3. Apply eviction policy:

   a) LRU (Least Recently Used)
      - Evict entry not accessed for longest time
      - Track: last_accessed_at for each entry
      - On eviction: find min(last_accessed_at)
      - Complexity: O(N) if full scan, O(1) with LRU list

   b) LFU (Least Frequently Used)
      - Evict entry with lowest access_count
      - Track: access_count for each entry
      - On eviction: find min(access_count)
     - Helps with scan operations (accessed many times but long ago)

   c) Random
      - Evict random entry
      - O(1) operation
      - Less intelligent but fast

Eviction percentage:
  When memory exceeds max:
    Evict 25% of memory (e.g., delete 500 entries if 2000 total)
    Prevents constant evictions
    Trade-off: use more memory to avoid churn

Cluster Persistence¶

Redis Cluster with RDB + AOF:

RDB (Snapshot):
  Every 60 seconds or 1000 writes:
    Pause: Stop accepting writes for 100ms
    Snapshot: Write all entries to disk
    Compress: gzip to 30% size
    Store: S3 + local disk

  Fast recovery: Load RDB on startup
  Data loss risk: Up to 1 minute of writes if crash

AOF (Append-Only File):
  Every write: Append to log
  fsync: Every 1 second (batch multiple writes)
  Durability: Latest 1 second of data

  Replay: On startup, replay log to reconstruct state
  Compaction: Periodically rewrite (remove obsolete ops)

Recovery procedure:
  1. Load latest RDB (fast, few seconds)
  2. Replay AOF since RDB timestamp (may take longer)

  Result: No data loss (with AOF + fsy every sec)
           Recovery time: N seconds

Monitoring & Alerts¶

Key Metrics:

Cache Hit Rate
hits / (hits + misses) = % of requests served from cache
Target: 80%+ for hot data, 10-20% for cold
Low hit rate → increase cache size or TTL
Cache Performance
Hit latency: <5ms (should be <1ms)
Miss latency: <100ms (depends on database)
Eviction rate: entries/sec evicted
Memory usage: % of max_memory
Consistency
Staleness: How old is cached data vs actual?
TTL misses: Items expired but not accessed
Invalidation lag: Time to propagate invalidate across shards
System Health
Shard distribution: Entries per shard (should be balanced)
Memory per shard: Detect hot shards
Replication lag: Time for writes to reach replicas

- alert: CacheHitRateLow
  expr: cache_hit_ratio < 0.70
  annotations: "Hit rate < 70% — increase TTL or cache size"

- alert: EvictionRateHigh
  expr: evictions_per_second > 1000
  annotations: "Evicting 1K entries/sec — cache thrashing"

- alert: MemoryFull
  expr: used_memory / max_memory > 0.95
  annotations: "Cache 95% full — aggressive eviction starting"

- alert: ShardHotspot
  expr: max_shard_entries > avg_shard_entries * 2
  annotations: "One shard has 2× average entries — rebalance"

- alert: ReplicationLagHigh
  expr: replica_lag_ms > 1000
  annotations: "Replica 1s behind leader — replication bottleneck"

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

Multi-tier caching → L1 (in-process) + L2 (Redis) + L3 (DB) = full coverage
Probabilistic early refresh → Avoid thundering herd on expiration
Write-through for consistency → Database + cache updated atomically
Cache-aside for simplicity → Invalidate on write, lazy load on read
Sharding for throughput → 12 shards = 12x single-shard throughput
LRU eviction → Keep hot items, evict cold items automatically

Eviction Policy Comparison¶

Policy	Latency	Memory Efficiency	Use Case
LRU	O(1) with linked list	Good	Most common, temporal locality
LFU	O(1) with counter	Excellent	Repeated scan operations
Random	O(1)	Fair	Speed priority
TTL	O(1)	Excellent	Fixed lifetime (sessions)

Consistency Strategies¶

Strategy	Consistency	Latency	Complexity
Write-through	Strong	Slower (2 writes)	Medium
Write-behind	Eventual	Fast (cache only)	High (crash risk)
Cache-aside	Eventual	Medium (invalidate lag)	Low

Tech Stack¶

L1 Cache: In-process (HashMap/LRU cache library)
L2 Cache: Redis Cluster (12 shards + replicas)
Persistence: RDB snapshots + AOF logs
Monitoring: Prometheus (hit rate, memory, evictions)
Invalidation: Pub/Sub (to broadcast invalidation)

🎯 Interview Summary (5 Minutes)¶

Multi-tier caching → L1 (in-process) + L2 (Redis) + L3 (database)
Thundering herd prevention → Probabilistic early refresh + distributed lock
Consistency → Write-through (strong) vs write-behind (fast but risky) vs cache-aside (eventual)
Eviction → LRU (keep hot items) or LFU (for freq access patterns)
Sharding → 12 shards for 12x throughput, hash ring determines shard
Persistence → RDB snapshots + AOF for durability, <1 min data loss
Monitoring → Track hit rate (80%+ target), eviction rate, memory usage

Web Cache¶

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Non-Functional Requirements¶

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints/Operations¶

Entity Data Model¶

High-Level Architecture¶

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Cache Stampede (Thundering Herd)¶

🟡 Problem: Inconsistence (Cache vs Database)¶

Solution: Sharding for Scalability¶

Step 4: Persistence Layer, Caching & Monitoring¶

In-Memory Storage (Hash Map)¶

Cluster Persistence¶

Monitoring & Alerts¶

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

Eviction Policy Comparison¶

Consistency Strategies¶

Tech Stack¶

🎯 Interview Summary (5 Minutes)¶

Glossary & Abbreviations¶

⚡ Quick Reference Cheat Sheet¶

🎯 Interview Summary (5 Minutes)¶

Glossary & Abbreviations¶