Platform Like Reddit¶

Interview Time: 60 min | Difficulty: Medium
Key Focus: Feed ranking, voting systems, community moderation, real-time updates

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Users can create posts (text, image, link) in communities (subreddits)
Users can comment on posts and reply to comments (threaded)
Users can upvote/downvote posts and comments
Users can earn karma (aggregated upvote count)
Feed shows "Hot", "New", "Top" sorted posts
Communities moderated by community leaders
Users can follow communities and get feed notifications
Spam detection and filtering
Search posts and comments
Gilding (award system)

Non-Functional Requirements¶

Requirement	Target	Notes
Scale	500M users, 100M+ posts/day	Billions of votes/day
Latency	Feed load <500ms	Sorting billions of posts
Availability	99.9% uptime	Eventually consistent OK
Consistency	Eventual for votes	Vote count lag <5 sec acceptable
Throughput	1M votes/sec	Peak: viral post = 100K votes/min
Sorting	Hot algorithm updates every 1 hour	Reddit's algorithm: score = votes, time decay

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints¶

POST /communities/{subreddit}/posts
  {title, content, image_url?, post_type: TEXT|IMAGE|LINK}
  → {post_id, created_at, upvote_count: 1 (auto-upvote by author)}

GET /communities/{subreddit}/feed?sort=hot|new|top&time=day|week|all
  → {posts: [{post_id, title, upvote_count, comment_count, created_at}]}

POST /posts/{post_id}/upvote
  {user_id}
  → {success: true, new_score, vote_recorded}

POST /posts/{post_id}/downvote
  {user_id}
  → {success: true, new_score, vote_recorded}

POST /posts/{post_id}/comments
  {content, parent_comment_id?}  -- parent_comment_id for nested replies
  → {comment_id, created_at, depth}

GET /posts/{post_id}/comments?sort=best|new&depth=3
  → {comments: [{comment_id, content, upvote_count, children: [...]}]}

GET /users/{user_id}/profile
  → {karma, post_count, comment_count, joined_at}

GET /search?q={query}&type=posts|comments
  → {results: [...]}

Entity Data Model¶

USERS
├─ user_id (PK)
├─ username (UNIQUE)
├─ total_karma (denormalized, sum of post + comment upvotes)
├─ created_at
├─ last_active_at

COMMUNITIES (subreddits)
├─ community_id (PK)
├─ name (UNIQUE, e.g., "programming")
├─ description
├─ subscriber_count
├─ moderators (array of user_ids)
├─ rules, nsfw_flag
├─ created_at

POSTS
├─ post_id (ULID, PK, sortable by time)
├─ user_id (FK)
├─ community_id (FK)
├─ title, content (TEXT)
├─ score (cached: upvotes - downvotes)
├─ upvote_count, downvote_count (denormalized, updated async)
├─ comment_count (denormalized)
├─ hot_score (calculatedperiodically for "Hot" ranking)
├─ created_at, updated_at
├─ is_deleted (soft delete)

VOTES (on posts)
├─ user_id (FK)
├─ post_id (FK)
├─ vote_value (-1=downvote, 1=upvote, 0=neutral)
├─ created_at
├─ PRIMARY KEY (user_id, post_id)

COMMENTS
├─ comment_id (ULID, PK, sortable by time)
├─ post_id (FK)
├─ user_id (FK)
├─ parent_comment_id (FK, nullable — for nested replies)
├─ content (TEXT)
├─ score (upvotes - downvotes)
├─ upvote_count, downvote_count (denormalized)
├─ best_score (for "best" ranking)
├─ depth (nesting level, max 10)
├─ created_at
├─ is_deleted (soft delete)

COMMENT_VOTES (on comments)
├─ user_id (FK)
├─ comment_id (FK)
├─ vote_value (-1, 0, 1)
├─ created_at
├─ PRIMARY KEY (user_id, comment_id)

SUBSCRIPTIONS (user following communities)
├─ user_id (FK)
├─ community_id (FK)
├─ subscribed_at
├─ PRIMARY KEY (user_id, community_id)

AWARDS (gilding)
├─ award_id (PK)
├─ post_id or comment_id (FK)
├─ user_id (gave award to this user)
├─ award_type (Gold, Platinum, etc.)
├─ given_at

High-Level Architecture¶

graph TB
    User["👤 User"]
    LB["Load Balancer"]

    FEED["Feed Service<br/>(ranking, scoring)"]
    POST["Post Service<br/>(CRUD)"]
    VOTE["Vote Service<br/>(counting, idempotent)"]
    COMMENT["Comment Service<br/>(threading, nesting)"]

    CACHE["Redis Cache<br/>(hot posts, scores,<br/>user feeds)"]
    DB["PostgreSQL<br/>(posts, comments,<br/>votes, metadata)"]

    RANKING_JOB["Ranking Job<br/>(hot score batch)<br/>runs hourly"]
    SEARCH["Elasticsearch<br/>(full-text search)"]

    NOTIFICATION["Notification Service<br/>(new comments,<br/>upvotes on my posts)"]
    QUEUE["Message Queue<br/>(async events)"]

    User --> LB
    LB --> FEED
    LB --> POST
    LB --> VOTE
    LB --> COMMENT

    FEED --> CACHE
    FEED --> DB

    VOTE --> CACHE
    VOTE --> QUEUE

    QUEUE --> RANKING_JOB
    QUEUE --> NOTIFICATION

    RANKING_JOB --> DB
    RANKING_JOB --> CACHE

    COMMENT --> DB
    COMMENT --> CACHE

    POST --> DB
    POST --> SEARCH

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Vote Counting Race Condition¶

Scenario: Post gets 100K upvotes simultaneously. Multiple instances increment count. Count goes to 95K instead of 100K.

Solution: Redis-based Vote Aggregation

Vote Flow (idempotent, handles retries):

1. User upvotes post
   POST /posts/{post_id}/upvote {user_id}

2. [IDEMPOTENCE CHECK]
   HGET votes:{post_id} user_id

   If exists (user already voted):
     → Return current score
     → No double-vote

   If not exists (first vote):
     → Proceed to vote

3. Vote recorded in Redis (fast, in-memory):
   HSET votes:{post_id} user_id 1  -- user_id → 1 (upvote)
   INCR score:{post_id}  -- increment score

4. Event sent to queue (async):
   PUBLISH vote:events {post_id, user_id, timestamp}

5. Background job (batch every 5 seconds):
   GROUP votes by post_id
   Aggregate: SUM(upvotes) - SUM(downvotes) = final_score
   UPDATE posts SET upvote_count = ..., score = ...
   (This is eventually consistent, lag <5 sec)

Result:
  - Redis handles 1M votes/sec (easy)
  - DB eventually gets accurate count (writes batched, reduce DB load)
  - Vote is idempotent (retries OK, no double-count)

Data Flow:
  Redis (hot, accurate, temporary):
    vote:{post_id}: {user_id_1: 1, user_id_2: -1, user_id_3: 1}

  Batch aggregation:
    upvotes = 2, downvotes = 1, net_score = 1
      → DB write (UPDATE votes SET...)

  Cache:
    score:{post_id} = 1  (replaces with DB value after batch)

🟡 Problem: Feed Ranking (Hot, New, Top)¶

Scenario: Decide ranking order. Simple recency? No, dead posts get old. By votes? Then old posts dominate. Need decay.

Solution: Time-Decay Scoring Algorithm

Reddit's Hot Algorithm:

Score = log10(upvotes - downvotes + 1) + (timestamp / 45000)

Components:
1. log10(votes + 1) — logarithmic: 100 votes ≈ 2 points, 1000 votes ≈ 3 points
   → Prevents very high-vote posts from dominating forever
   → "Votes diminish in value as count increases"

2. timestamp / 45000 — decay term for age
   45000 seconds = 12.5 hours
   → A post made 12.5 hours ago loses 1 point of value
   → Forces older posts to fall down feed as new posts arrive

Example:
  Post A: 1000 upvotes, created 1 hour ago
    Score = log10(1001) + (creation_time / 45000)
          = 3.0 + small_value ≈ 3.0

  Post B: 10 upvotes, created 5 minutes ago
    Score = log10(11) + (recent_time / 45000)
          = 1.04 + 0 ≈ 1.04

  → But if Post A is 12 hours old:
    Score = 3.0 + (-12 hours in seconds / 45000)
          = 3.0 - 0.96 ≈ 2.04

  → Post A and B can have similar scores even with 1000× vote difference!
  → Prevents vote-inflation from bottlenecking fresh content

Batch Computation:
  - Calculate hot_score every 1 hour for all posts
  - Store in DB as materialized view
  - Query returns pre-sorted posts (no real-time calculation)
  - Lag: ±30 minutes acceptable for "hot" ranking

Solution: Comment Threading Depth Control¶

Comments can be nested: reply to reply to reply
BUT: Deep nesting kills UX (hard to read, API load)

Solution: Cap nesting depth to 10

Example:
  Post P1
  ├─ Comment C1 (depth 1)
  │  └─ Reply C2 (depth 2)
  │     └─ Reply C3 (depth 3, max depth reached)
  │        └─ Reply to C3 → stored as reply to C1 (depth 2, re-thread)

Query optimization:
  GET /posts/{post_id}/comments?depth=3
  → Top-level comments only
  → For each comment, fetch children up to depth 3
  → Reduces payload (otherwise N-level nesting = exponential comments)

Caching:
  cache:comments:{post_id}:depth_2 = {comment_ids at depths 0-2}
  TTL: 5 minutes
  → Fast fetching, reduced depth traversal

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design¶

CREATE TABLE posts (
  post_id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  community_id BIGINT NOT NULL REFERENCES communities(community_id),
  title VARCHAR(300),
  content TEXT,
  upvote_count INT DEFAULT 1,  -- self-upvote
  downvote_count INT DEFAULT 0,
  comment_count INT DEFAULT 0,
  score INT DEFAULT 1,  -- upvotes - downvotes
  hot_score DECIMAL(10,4),  -- time-decay score, materialized
  created_at TIMESTAMP DEFAULT NOW(),
  is_deleted BOOLEAN DEFAULT FALSE
);

CREATE INDEX idx_posts_community_hot 
  ON posts(community_id, hot_score DESC) 
  WHERE NOT is_deleted;

CREATE INDEX idx_posts_created_at 
  ON posts(created_at DESC) 
  WHERE NOT is_deleted;

-- Votes on Posts
CREATE TABLE post_votes (
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  post_id BIGINT NOT NULL REFERENCES posts(post_id),
  vote_value SMALLINT,  -- -1, 0, 1
  created_at TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY (user_id, post_id)
);

CREATE INDEX idx_post_votes_post_id 
  ON post_votes(post_id);

-- Comments
CREATE TABLE comments (
  comment_id BIGSERIAL PRIMARY KEY,
  post_id BIGINT NOT NULL REFERENCES posts(post_id),
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  parent_comment_id BIGINT REFERENCES comments(comment_id),
  content TEXT,
  upvote_count INT DEFAULT 1,
  downvote_count INT DEFAULT 0,
  score INT DEFAULT 1,
  best_score DECIMAL(10,4),  -- for "best" ranking
  depth INT,  -- 0 = top-level, max 10
  created_at TIMESTAMP DEFAULT NOW(),
  is_deleted BOOLEAN DEFAULT FALSE
);

CREATE INDEX idx_comments_post_best 
  ON comments(post_id, best_score DESC) 
  WHERE depth < 10 AND NOT is_deleted;

CREATE INDEX idx_comments_parent 
  ON comments(parent_comment_id) 
  WHERE NOT is_deleted;

-- Comment Votes
CREATE TABLE comment_votes (
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  comment_id BIGINT NOT NULL REFERENCES comments(comment_id),
  vote_value SMALLINT,
  created_at TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY (user_id, comment_id)
);

-- Subscriptions (user follows communities)
CREATE TABLE subscriptions (
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  community_id BIGINT NOT NULL REFERENCES communities(community_id),
  subscribed_at TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY (user_id, community_id)
);

CREATE INDEX idx_subscriptions_user 
  ON subscriptions(user_id);

-- Karma (denormalized, updated async)
CREATE TABLE user_karma (
  user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
  post_karma INT DEFAULT 0,  -- sum of post upvotes
  comment_karma INT DEFAULT 0,
  total_karma INT DEFAULT 0,
  updated_at TIMESTAMP DEFAULT NOW()
);

Caching Strategy¶

Tier 1: Redis

1. Feed Cache (pre-computed hot posts)
   Key: "feed:{community_id}:hot"
   Value: [post_id_1, post_id_2, ..., post_id_100] (sorted by hot_score)
   TTL: 30 minutes
   Purpose: Avoid re-sorting billions of posts on every page load
   Hit rate: 70% (80% of users load same hot posts)

2. Post Score Cache
   Key: "score:{post_id}"
   Value: {upvotes: 1000, downvotes: 50, net: 950}
   TTL: 5 minutes
   Purpose: Don't hit DB for every score query

3. Comment Tree Cache
   Key: "comments:{post_id}:top_level"
   Value: [comment_id_1, comment_id_2, ...] (top 10 by best_score)
   TTL: 5 minutes
   Purpose: Fast comment loading

4. User Vote Cache (personal votes)
   Key: "votes:{user_id}:{post_id}"
   Value: {vote_value: 1}  -- to check if user already voted
   TTL: 1 hour
   Purpose: Prevent double-voting and show user's vote state

5. Community Feed Cache
   Key: "community:{community_id}:new"
   Value: [post_id_1, post_id_2, ...] (latest 100 posts)
   TTL: 10 minutes
   Purpose: Fast "new" feed loading

Tier 2: Database (batch write buffer)

Vote Buffer (in Redis):
  vote:{post_id}:pending = {user_id_1, user_id_2, ...}

Batch Job (every 5 seconds):
  1. Read all pending votes from Redis
  2. Aggregate by post_id
  3. Single UPDATE to DB:
     UPDATE posts SET upvote_count = upvote_count + ?, ...
  4. Clear Redis buffer

Result:
  - 1M votes/sec in Redis (immediate, durable via AOF)
  - Batched to DB every 5 sec (reduce DB write load)
  - Eventual consistency: Vote appears in DB within 5 sec

Monitoring & Alerts¶

Key Metrics:

Feed Performance
Feed load latency (P95 <500ms target)
Cache hit rate for feeds (70%+ target)
"Hot" algorithm freshness (updated hourly)
Vote Processing
Vote throughput (votes/sec)
Vote lag (vote appears in feed, <5 sec target)
Vote duplication rate (should be < 0.01%)
Redis buffer size (should be <100K entries)
Comment Quality
Comment depth distribution (% at each level)
Deep nesting incidents (>depth 10 should be rare)
Comment load latency (P95 <200ms)
Spam & Moderation
Posts deleted by moderators (count/day)
Flagged/reported content (processing time)
Karma manipulation attempts (vote brigading)
User Engagement
DAU/MAU growth
Post creation rate (posts/min)
Comment rate (comments/min)
Upvote/post ratio (avg upvotes per post)

- alert: VoteLagHigh
  expr: vote_db_lag_seconds > 10
  annotations: "Vote lag > 10s — Redis buffer issue or DB bottleneck"

- alert: FeedLatencyHigh
  expr: feed_load_latency_p95 > 1000
  annotations: "Feed latency > 1s — check hot_score materialization"

- alert: HotScoreStale
  expr: hours_since_last_ranking_update > 2
  annotations: "Hot ranking not updated in 2 hours — batch job failed"

- alert: CacheMissRateHigh
  expr: feed_cache_hit_rate < 0.5
  annotations: "Feed cache hit rate < 50% — increase cache TTL or size"

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

Redis vote aggregation — Fast voting counters, batch to DB every 5 sec
Time-decay scoring — log10(votes) + timestamp/45000 prevents stale posts dominating
Hourly hot_score materialization — Pre-compute and cache, avoid live calculation
Comment depth limit = 10 — Prevents deep nesting, improves query performance
Feed caching (30 min) — 70% cache hit rate saves billions of DB queries
Vote idempotency — Check (user_id, post_id) primary key, retries safe

When to Use What¶

Need	Technology	Why
Vote counting	Redis + batch writes	1M votes/sec + eventual consistency
Feed ranking	Materialized hot_score	Sub-500ms feed loads
Comment threading	Depth limit (10) + primary key on parent	Prevents exponential queries
Full-text search	Elasticsearch	Fast title/content search

Tech Stack¶

Backend: Python/Go (stateless)
Database: PostgreSQL (posts, comments, votes)
Cache: Redis (scores, feed, votes)
Search: Elasticsearch
Batch: Spark/Flink (ranking job)
Monitoring: Prometheus

🎯 Interview Summary (5 Minutes)¶

Vote aggregation → Redis + batch writes every 5 sec to DB
Hot algorithm → log10(votes) + time_decay, materialized hourly
Feed caching → 30-min TTL, 70% hit rate saves DB queries
Comment nesting → Cap depth at 10 to prevent exponential loading
Vote idempotence → Primary key (user_id, post_id) prevents double-votes
Ranking signals → Votes, time, comments, gilding all feed into hot_score
Eventual consistency → Votes cached in Redis, DB updated every 5 sec

Platform Like Reddit¶

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Non-Functional Requirements¶

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints¶

Entity Data Model¶

High-Level Architecture¶

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Vote Counting Race Condition¶

🟡 Problem: Feed Ranking (Hot, New, Top)¶

Solution: Comment Threading Depth Control¶

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design¶

Caching Strategy¶

Monitoring & Alerts¶

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

When to Use What¶

Tech Stack¶

🎯 Interview Summary (5 Minutes)¶

Glossary & Abbreviations¶

⚡ Quick Reference Cheat Sheet¶

🎯 Interview Summary (5 Minutes)¶

Glossary & Abbreviations¶