System Design Interview Cheatsheet

5 minute read

Quick reference for system design interviews - executive summaries only.

Core Concepts Quick Reference

Concept	One-Liner	When to Use
Horizontal Scaling	Add more machines	When vertical limits reached
Vertical Scaling	Add more resources to machine	Simple cases, quick fix
Load Balancer	Distribute traffic across servers	Multiple servers
CDN	Cache content at edge locations	Static content, global users
Cache	Store frequently accessed data in memory	Read-heavy workloads
Message Queue	Async communication buffer	Decouple services
Database Replication	Copy data across nodes	High availability, read scaling
Database Sharding	Split data across nodes	Write scaling, large datasets

Database Selection Guide

Use Case	Recommended	Why
Transactions, ACID	PostgreSQL, MySQL	ACID compliance
Document storage	MongoDB	Flexible schema
Wide-column, time-series	Cassandra, HBase	Write-heavy, scale
Key-Value cache	Redis, Memcached	Fast lookups
Graph relationships	Neo4j	Complex relationships
Search	Elasticsearch	Full-text search
Analytics (OLAP)	ClickHouse, BigQuery	Columnar, aggregations
Blob storage	S3, GCS	Large files

Consistency vs Availability

Choose Consistency When	Choose Availability When
Financial transactions	Social media feeds
Inventory management	User preferences
Booking systems	Analytics dashboards
Leader election	Metrics collection
Distributed locks	Shopping cart

Common Patterns Quick Reference

Caching Patterns

Pattern	Write	Read	Best For
Cache-Aside	App writes to DB	Check cache, fallback to DB	General purpose
Read-Through	N/A	Cache fetches from DB	Simple reads
Write-Through	Write cache + DB sync	From cache	Consistency needed
Write-Behind	Write cache, async DB	From cache	Write-heavy

Rate Limiting Algorithms

Algorithm	Description	Pros/Cons
Token Bucket	Tokens refill at rate, requests consume tokens	Allows bursts
Leaky Bucket	Fixed outflow rate	Smooth output
Fixed Window	Counter per time window	Simple, edge spikes
Sliding Window	Moving window counter	Accurate, more memory

ID Generation

Method	Size	Sortable	Coordination
UUID v4	128 bit	No	None
Snowflake	64 bit	Yes	Machine ID
ULID	128 bit	Yes	None
Auto-increment	Variable	Yes	Required

Data Structures for Scale

Structure	Purpose	Use Case
Bloom Filter	Probably in set	Cache lookup, spam filter
HyperLogLog	Count unique items	Unique visitors
Count-Min Sketch	Frequency estimation	Top-K, heavy hitters
Consistent Hash	Distribute across nodes	Distributed cache
Merkle Tree	Detect data differences	Anti-entropy sync
LSM Tree	Write-optimized storage	Write-heavy DBs
B+ Tree	Read-optimized index	Database indexes
Skip List	Sorted data structure	Redis sorted sets
Trie	Prefix lookups	Autocomplete
Geohash/S2	Location encoding	Proximity search
Inverted Index	Text to documents	Full-text search

Communication Patterns

Protocol	Use When	Latency	Complexity
REST/HTTP	Standard APIs	Medium	Low
gRPC	Microservices	Low	Medium
WebSocket	Real-time bidirectional	Low	Medium
SSE	Server push	Low	Low
Message Queue	Async, decoupling	Variable	Medium
GraphQL	Flexible queries	Medium	High

Scaling Numbers

QPS Guidelines

Service Type	Typical QPS/Server
Web server	1K-10K
Database (simple)	1K-5K
Cache (Redis)	100K+
Load Balancer	100K+

Storage Rules of Thumb

Text tweet (140 chars): ~280 bytes
User record: ~1 KB
Image thumbnail: ~20 KB
HD Image: ~2 MB
1 minute video (compressed): ~10 MB

Time Estimates

Seconds in day: 86,400 ≈ 100K
Seconds in month: 2.6M ≈ 3M
Seconds in year: 31.5M ≈ 30M

Reliability Patterns

Pattern	Purpose	Implementation
Circuit Breaker	Fail fast, prevent cascade	Hystrix, Resilience4j
Retry with Backoff	Handle transient failures	Exponential backoff
Bulkhead	Isolate failures	Thread/connection pools
Timeout	Don’t wait forever	Set aggressive timeouts
Fallback	Graceful degradation	Default/cached values
Health Check	Detect failures	/health endpoint
Idempotency	Safe retries	Idempotency keys

Distributed Transactions

Pattern	Consistency	Complexity	Use Case
2PC	Strong	High	Short transactions
Saga	Eventual	Medium	Long transactions
Outbox	At-least-once	Low	DB + Events
TCC	Strong	High	Reservations

Consensus & Coordination

Tool	Use For
Zookeeper	Leader election, config, locks
etcd	Kubernetes, config, service discovery
Consul	Service mesh, discovery, config
Redis	Distributed locks (Redlock)

Observability Stack

Layer	Tools
Metrics	Prometheus, Grafana, Datadog
Logging	ELK Stack, Loki, Splunk
Tracing	Jaeger, Zipkin, X-Ray
Alerting	PagerDuty, OpsGenie

Security Checklist

Interview Questions Mapped to Concepts

Question	Key Concepts
Design URL Shortener	Hash, Base62, Cache, Database
Design Twitter	Feed fanout, Timeline, Sharding
Design Chat System	WebSocket, Presence, Message Queue
Design YouTube	CDN, Transcoding, Adaptive Streaming
Design Instagram	Object Storage, CDN, Feed
Design Uber	Geospatial Index, Matching, Real-time
Design Rate Limiter	Token Bucket, Distributed Counter
Design Notification	Push, Queue, Priority
Design Search	Inverted Index, Ranking, Sharding
Design File Storage	Chunks, Replication, Metadata

High-Level Design Template

┌─────────────────────────────────────────────────────────────┐
│                         Clients                              │
│                  (Web, Mobile, API)                         │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                           CDN                                │
│                   (Static content)                          │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Load Balancer                            │
│                  (L7, Health checks)                        │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      API Gateway                             │
│        (Auth, Rate limit, Routing, SSL termination)        │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│   Service A   │    │   Service B   │    │   Service C   │
└───────────────┘    └───────────────┘    └───────────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│     Cache     │    │   Database    │    │ Message Queue │
│    (Redis)    │    │  (Primary)    │    │   (Kafka)     │
└───────────────┘    └───────────────┘    └───────────────┘
                              │
                              ▼
                    ┌───────────────┐
                    │   Replicas    │
                    └───────────────┘

Estimation Formulas

QPS

QPS = (DAU × actions_per_user) / seconds_in_day
Peak QPS ≈ 2-3 × average QPS

Storage

Storage = users × data_per_user × retention_period
Growth = new_users_per_day × data_per_user

Bandwidth

Bandwidth = QPS × request_size (inbound)
Bandwidth = QPS × response_size (outbound)

Cache Size

Cache = QPS × cache_entry_size × cache_duration
Cache hit ratio target: 80-95%

Servers Needed

Servers = Peak_QPS / QPS_per_server
Add 20-30% for headroom

Tradeoff Discussions

Decision	Option A	Option B
Consistency vs Availability	CP (consistency)	AP (availability)
SQL vs NoSQL	ACID, joins, schema	Scale, flexibility
Sync vs Async	Immediate feedback	Decoupling, scale
Push vs Pull	Real-time, more connections	Polling, simpler
Cache vs Fresh	Speed	Accuracy
Monolith vs Microservices	Simple, fast	Scale, teams
Embedded vs Managed	Control, cost	Ease, reliability

Red Flags to Avoid

❌ Single point of failure ❌ No caching strategy ❌ Ignoring data growth ❌ No backup/recovery plan ❌ Tight coupling between services ❌ No rate limiting ❌ Synchronous calls for everything ❌ Not considering peak load ❌ Ignoring security ❌ No monitoring/alerting

Kibana Search

less than 1 minute read

RAG Learning Tutorial - From First Principles to Production

2 minute read

Notes Tutorials & Revisions

1 minute read

Migrating HTTP APIs to HTTPS in Spring Boot - Planning & Implementation

7 minute read