System Design Fundamentals
Executive summaries and core concepts for system design interviews.
Core Metrics
Latency
Executive Summary: Time for a single request to travel from source to destination. Measured in milliseconds (ms). Lower is better.
| Type | Typical Values |
|---|---|
| L1 Cache Reference | 0.5 ns |
| L2 Cache Reference | 7 ns |
| Main Memory Reference | 100 ns |
| SSD Random Read | 150 μs |
| HDD Seek | 10 ms |
| Round trip within same datacenter | 500 μs |
| Round trip CA to Netherlands | 150 ms |
Throughput
Executive Summary: Rate of data transfer or requests processed per unit time. Measured in requests/second (RPS) or transactions/second (TPS). Higher is better.
- Network: Measured in bits per second (bps), Mbps, Gbps
- Application: Requests per second (RPS), Queries per second (QPS)
- Database: Transactions per second (TPS)
Availability
Executive Summary: Percentage of time a system is operational. Expressed as “nines” (99.9% = three nines).
| Availability | Downtime/Year | Downtime/Month | Downtime/Week |
|---|---|---|---|
| 99% (two 9s) | 3.65 days | 7.31 hours | 1.68 hours |
| 99.9% (three 9s) | 8.77 hours | 43.83 min | 10.08 min |
| 99.99% (four 9s) | 52.60 min | 4.38 min | 1.01 min |
| 99.999% (five 9s) | 5.26 min | 26.30 sec | 6.05 sec |
Formula: Availability = Uptime / (Uptime + Downtime)
Reliability
Executive Summary: Probability that a system will perform its intended function without failure for a specified period. MTBF (Mean Time Between Failures) is key metric.
- MTBF: Mean Time Between Failures
- MTTR: Mean Time To Recovery
- Availability = MTBF / (MTBF + MTTR)
Durability
Executive Summary: Guarantee that data, once written, will not be lost. Expressed as probability (e.g., 99.999999999% = 11 nines).
- S3 Standard: 99.999999999% (11 nines) durability
- Achieved through: Replication, Checksums, Error correction codes
CAP Theorem
Executive Summary: In a distributed system, you can only guarantee 2 of 3 properties: Consistency, Availability, Partition Tolerance. Since network partitions are inevitable, choose between CP (consistent but may be unavailable) or AP (available but may be inconsistent).
Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a response (without guarantee it’s the most recent).
Partition Tolerance: System continues to operate despite network partitions.
| Choice | Behavior | Examples |
|---|---|---|
| CP | Consistent, may reject requests during partition | MongoDB, HBase, Redis |
| AP | Available, may return stale data | Cassandra, DynamoDB, CouchDB |
PACELC Extension
Executive Summary: Extends CAP - even when there’s no Partition, there’s a tradeoff between Latency and Consistency.
- PA/EL: Sacrifice consistency for availability and lower latency (Cassandra, DynamoDB)
- PC/EC: Prefer consistency over availability and latency (VoltDB, H-Store)
- PA/EC: Availability during partition, consistency otherwise (MongoDB)
Consistency Models
Strong Consistency
Executive Summary: All nodes see the same data at the same time. Every read returns the most recent write. Higher latency, simpler programming model.
- Implemented via: Synchronous replication, Consensus protocols (Paxos, Raft)
- Use when: Financial transactions, inventory management
Eventual Consistency
Executive Summary: Given enough time without new updates, all replicas will converge to the same value. Lower latency, higher availability.
- Implemented via: Asynchronous replication, Conflict resolution
- Use when: Social media feeds, DNS, Shopping cart
Causal Consistency
Executive Summary: Operations that are causally related are seen by all nodes in the same order. Concurrent operations may be seen in different orders.
- If A causes B, everyone sees A before B
- Stronger than eventual, weaker than strong
- Use when: Collaborative editing, Comment threads
Read-Your-Writes Consistency
Executive Summary: A user always sees their own writes immediately. Other users may see stale data.
- Session consistency
- Implemented via: Sticky sessions, Version vectors
Linearizability
Executive Summary: Strongest consistency model. Operations appear to happen instantaneously at some point between invocation and response.
- Real-time ordering guaranteed
- Expensive to implement at scale
- Use when: Distributed locks, Leader election
Replication Strategies
Single Leader (Master-Slave)
Executive Summary: One node handles all writes; replicas handle reads. Simple, but leader is a bottleneck and single point of failure.
Client → Leader (Write) → Replicas (Async/Sync)
Client → Replicas (Read)
Pros: Simple, consistent writes, easy conflict resolution Cons: Leader bottleneck, failover complexity, write latency
Multi-Leader (Master-Master)
Executive Summary: Multiple nodes accept writes. Better write performance and availability, but complex conflict resolution required.
- Use cases: Multi-datacenter, Offline-capable apps
- Conflict resolution: Last-write-wins, Custom merge logic, CRDTs
Leaderless (Dynamo-style)
Executive Summary: Any node can accept reads/writes. Uses quorum to ensure consistency. Highly available, no single point of failure.
- Quorum: W + R > N (Write nodes + Read nodes > Total replicas)
- Read Repair: Fix inconsistencies on read
- Anti-entropy: Background process to sync replicas
- Examples: Cassandra, DynamoDB, Riak
Partitioning / Sharding
Executive Summary: Split data across multiple nodes to scale horizontally. Each shard contains a subset of data. Key challenge: choosing the right partition key.
Range Partitioning
Executive Summary: Data divided by key ranges (A-M, N-Z). Good for range queries, but can lead to hot spots.
- Pros: Efficient range queries, sorted data
- Cons: Hot spots, uneven distribution
- Use when: Time-series data, alphabetical lookups
Hash Partitioning
Executive Summary: Hash function determines partition. Even distribution, but loses ordering and range query efficiency.
- Pros: Even distribution, no hot spots
- Cons: Inefficient range queries
- Use when: Random access patterns, user data by ID
Consistent Hashing
Executive Summary: Nodes and keys mapped to a ring. Data moves only between adjacent nodes when topology changes. Minimizes data movement during scaling.
- Virtual nodes improve load distribution
- Use when: Distributed caches, DHTs
- Examples: Cassandra, DynamoDB, Memcached
Directory-Based Partitioning
Executive Summary: Lookup service maintains mapping of keys to partitions. Flexible but lookup service is potential bottleneck.
Load Balancing
Executive Summary: Distribute traffic across multiple servers to improve performance, reliability, and scalability. Can operate at L4 (transport) or L7 (application).
Algorithms
| Algorithm | Description | Use Case |
|---|---|---|
| Round Robin | Requests distributed sequentially | Equal capacity servers |
| Weighted Round Robin | Servers assigned weights | Different capacity servers |
| Least Connections | Route to server with fewest connections | Long-lived connections |
| IP Hash | Hash client IP to determine server | Session affinity |
| Least Response Time | Route to fastest responding server | Performance optimization |
| Random | Random server selection | Simple, stateless |
Layer 4 vs Layer 7
| Layer 4 (Transport) | Layer 7 (Application) |
|---|---|
| Faster (no content inspection) | Content-based routing |
| TCP/UDP level | HTTP headers, cookies |
| Limited routing options | URL, method-based routing |
| Examples: HAProxy, AWS NLB | Examples: NGINX, AWS ALB |
Health Checks
- Active: Load balancer periodically checks endpoints
- Passive: Monitor response from real traffic
- Criteria: HTTP status, response time, custom checks
Caching
Executive Summary: Store frequently accessed data in fast storage (memory) to reduce latency and database load. Key decisions: what to cache, where to cache, when to invalidate.
Cache Strategies
Cache-Aside (Lazy Loading)
Executive Summary: Application checks cache first, fetches from DB on miss, then populates cache. Simple but can have cache stampede issues.
1. Check cache
2. If miss → Read from DB → Write to cache
3. Return data
Read-Through
Executive Summary: Cache sits between app and DB. Cache handles misses by fetching from DB automatically.
Write-Through
Executive Summary: Data written to cache and DB synchronously. Consistent but higher write latency.
Write-Behind (Write-Back)
Executive Summary: Data written to cache immediately, DB updated asynchronously. Lower latency but risk of data loss.
Write-Around
Executive Summary: Data written directly to DB, bypassing cache. Good for write-heavy workloads where data isn’t immediately read.
Cache Invalidation
- TTL (Time-To-Live): Expire after fixed duration
- Event-based: Invalidate on data change
- Version-based: Cache key includes version number
Eviction Policies
| Policy | Description |
|---|---|
| LRU (Least Recently Used) | Evict least recently accessed |
| LFU (Least Frequently Used) | Evict least frequently accessed |
| FIFO | Evict oldest entries first |
| Random | Random eviction |
| TTL | Evict expired entries |
Cache Levels
- Browser Cache: Client-side, HTTP headers (Cache-Control, ETag)
- CDN Cache: Edge servers close to users
- Application Cache: In-memory (Redis, Memcached)
- Database Cache: Query cache, buffer pool
Message Queues
Executive Summary: Asynchronous communication between services. Decouple producers and consumers. Provides buffering, load leveling, and guaranteed delivery.
Point-to-Point vs Pub/Sub
| Point-to-Point | Pub/Sub |
|---|---|
| One consumer per message | Multiple subscribers |
| Message deleted after consumption | Message delivered to all subscribers |
| Work distribution | Event notification |
| Examples: SQS, RabbitMQ | Examples: Kafka, SNS, Redis Pub/Sub |
Delivery Guarantees
| Guarantee | Description | Trade-off |
|---|---|---|
| At-most-once | Fire and forget | May lose messages |
| At-least-once | Retry until acked | May have duplicates |
| Exactly-once | Deduplication + at-least-once | Complex, expensive |
Message Ordering
- FIFO: Strict ordering (higher latency)
- Best effort: No ordering guarantee (higher throughput)
- Partition ordering: Order within partition only (Kafka)
Dead Letter Queue (DLQ)
Executive Summary: Queue for messages that failed processing after max retries. Enables debugging and manual intervention.
Proxies
Forward Proxy
Executive Summary: Sits in front of clients. Hides client identity, caches responses, filters content.
- Use cases: Corporate firewalls, Anonymity, Content filtering
Reverse Proxy
Executive Summary: Sits in front of servers. Load balancing, SSL termination, caching, compression.
- Examples: NGINX, HAProxy, Cloudflare
- Use cases: Load balancing, Security, Caching
API Gateway
Executive Summary: Single entry point for microservices. Handles routing, authentication, rate limiting, protocol translation.
- Examples: Kong, AWS API Gateway, Apigee
- Features: Auth, Rate limiting, Logging, Transformation
Database Indexing
Executive Summary: Data structures that improve query speed at the cost of write performance and storage. Choose indexes based on query patterns.
B-Tree Index
Executive Summary: Balanced tree structure. Good for range queries and equality. Most common index type.
- O(log n) lookups, inserts, deletes
- Supports range queries
- Use for: Primary keys, Foreign keys, Range queries
Hash Index
Executive Summary: Hash table for O(1) lookups. Only supports equality queries.
- No range query support
- Use for: Exact match queries
Composite Index
Executive Summary: Index on multiple columns. Order matters (leftmost prefix rule).
INDEX (a, b, c)
-- Supports: WHERE a, WHERE a AND b, WHERE a AND b AND c
-- Does NOT support: WHERE b, WHERE c, WHERE b AND c
Full-Text Index
Executive Summary: Inverted index for text search. Supports natural language queries.
- Examples: Elasticsearch, PostgreSQL tsvector
- Use for: Search functionality
Covering Index
Executive Summary: Index contains all columns needed for query. No table lookup required.
Unique ID Generation
Executive Summary: Generate globally unique identifiers in distributed systems. Trade-offs between sortability, size, and coordination requirements.
UUID
Executive Summary: 128-bit universally unique identifier. No coordination required, but not sortable and large.
- V1: Timestamp + MAC address
- V4: Random (most common)
- Pros: No coordination, globally unique
- Cons: Not sortable, 128 bits, no locality
Snowflake ID
Executive Summary: 64-bit ID with timestamp, machine ID, sequence. Time-sortable, no coordination between machines.
| Timestamp (41 bits) | Machine ID (10 bits) | Sequence (12 bits) |
- ~69 years of IDs per machine
- 4096 IDs per millisecond per machine
- Used by: Twitter, Discord
ULID
Executive Summary: 128-bit Universally Unique Lexicographically Sortable Identifier. Sortable, URL-safe.
- First 48 bits: timestamp
- Last 80 bits: randomness
- Lexicographically sortable
Database Auto-Increment
Executive Summary: Simple sequential IDs. Requires coordination, single point of failure.
- Pros: Simple, small, sortable
- Cons: Coordination required, reveals information
API Design
REST
Executive Summary: Stateless, resource-oriented API using HTTP methods. Simple, cacheable, widely adopted.
| Method | Action | Idempotent |
|---|---|---|
| GET | Read | Yes |
| POST | Create | No |
| PUT | Update (full) | Yes |
| PATCH | Update (partial) | No |
| DELETE | Delete | Yes |
GraphQL
Executive Summary: Query language for APIs. Client specifies exact data needed. Reduces over/under-fetching.
- Pros: Flexible queries, strong typing, single endpoint
- Cons: Complexity, caching challenges, N+1 queries
gRPC
Executive Summary: High-performance RPC using Protocol Buffers. Strongly typed, supports streaming.
- Uses HTTP/2 (multiplexing, header compression)
- Binary format (smaller, faster)
- Use for: Microservices, low-latency requirements
Serialization Formats
| Format | Type | Human Readable | Size | Speed |
|---|---|---|---|---|
| JSON | Text | Yes | Large | Slow |
| XML | Text | Yes | Very Large | Slow |
| Protocol Buffers | Binary | No | Small | Fast |
| Avro | Binary | No | Small | Fast |
| Thrift | Binary | No | Small | Fast |
| MessagePack | Binary | No | Medium | Fast |
Time Synchronization
NTP (Network Time Protocol)
Executive Summary: Synchronizes clocks over network. Accuracy: 1-50ms over internet, <1ms on LAN.
- Hierarchical system of time sources (strata)
- Stratum 0: Atomic clocks, GPS
- Stratum 1: Directly connected to Stratum 0
Logical Clocks
Lamport Timestamps
Executive Summary: Partial ordering of events. If A happened-before B, then L(A) < L(B). Converse not guaranteed.
Vector Clocks
Executive Summary: Track causality between events. Can determine if events are concurrent or causally related.
- Each node maintains vector of counters
- Expensive in large systems
Hybrid Logical Clocks (HLC)
Executive Summary: Combines physical and logical time. Provides causality while staying close to real time.
Distributed Tracing
Executive Summary: Track requests across microservices. Essential for debugging and performance analysis in distributed systems.
- Trace: End-to-end journey of a request
- Span: Individual operation within a trace
- Context Propagation: Pass trace ID between services
Tools: Jaeger, Zipkin, AWS X-Ray, OpenTelemetry