E-commerce Recommendation System¶

Interview Time: 60 min | Difficulty: Medium
Key Focus: Machine learning, collaborative filtering, ranking, personalization at scale

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Recommend products to users based on past behavior
Personalized recommendations per user (each user sees different products)
Support different algorithms: collaborative filtering, content-based, trending
A/B testing of recommendation strategies
Real-time personalization during active session
Fallback recommendations if user has no history
Explain why item recommended ("Customers like you also bought...", "Trending now")
Support for new users (cold start problem)
Support for new products (new inventory)
Diversity in recommendations (not all same category)

Non-Functional Requirements¶

Requirement	Target	Notes
Latency	<100ms for recommendations	Real-time, during user session
Scale	100M+ users, 10M+ products	Billions of pairs
Accuracy	Click-through rate (CTR) >2%	A/B tests track improvement
Freshness	Daily model updates	Models trained on yesterday's data
Throughput	100K recommendation requests/sec	Peak traffic during shopping
Compute	Offline batch (hours) + online serving	Training is expensive, serving is cheap

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints¶

GET /recommendations?user_id={id}&num_items=10
  → {items: [{product_id, score, reason: "Popular in Electronics"}]}

GET /recommendations/trending?category={cat}&num_items=10
  → {items: [{product_id, score, popularity}]}

POST /events/click
  {user_id, product_id, timestamp}
  → {status: logged}

POST /events/purchase
  {user_id, product_id, price, timestamp}
  → {status: logged}

GET /models/status
  → {last_trained_at, training_accuracy, model_version}

POST /models/ab-test
  {treatment_model_id, control_model_id, duration_hours: 24}
  → {test_id, start_time, metrics_dashboard}

Entity Data Model¶

USERS
├─ user_id (PK)
├─ country, language, device_type
├─ first_seen_at, last_seen_at

PRODUCTS
├─ product_id (PK)
├─ name, category, price
├─ created_at, last_updated_at
├─ embedding_vector (learned by model) -- for similarity search

USER_EVENTS (activity log)
├─ user_id (FK)
├─ product_id (FK)
├─ event_type (CLICK, VIEW, PURCHASE, WISHLIST, RETURN)
├─ timestamp
├─ session_id (groups events from single session)
├─ PRIMARY KEY (user_id, timestamp)

USER_PRODUCT_INTERACTIONS (aggregated)
├─ user_id (FK)
├─ product_id (FK)
├─ click_count, purchase_count, view_duration
├─ last_interaction_at
├─ score = (clicks + 3*purchases) / recency_factor
├─ PRIMARY KEY (user_id, product_id)

RECOMMENDATIONS (precomputed offline)
├─ recommendation_id (PK)
├─ user_id (FK)
├─ product_id (FK)
├─ score (model confidence, 0-1)
├─ reason (TEXT, explanation)
├─ model_version (which version generated this)
├─ created_at
├─ PRIMARY KEY (user_id, product_id, model_version)

MODELS (trained recommendation models)
├─ model_id (ULID, PK)
├─ model_type (COLLAB_FILTER, CONTENT_BASED, HYBRID)
├─ model_version (v1, v2, etc.)
├─ training_date
├─ metrics {accuracy, precision, recall, auc}
├─ status (TRAINING, LIVE, ARCHIVED)
├─ training_samples_count
├─ created_at, promoted_to_live_at

A_B_TESTS
├─ test_id (PK)
├─ control_model_id (FK)
├─ treatment_model_id (FK)
├─ start_time, end_time
├─ num_users
├─ control_ctr, treatment_ctr
├─ winner (control|treatment)
├─ pvalue (statistical significance)

High-Level Architecture¶

graph TB
    User["👤 User"]
    LB["Load Balancer"]

    ONLINE["Online Serving<br/>(real-time)"]

    CACHE["Redis Cache<br/>(precomputed recs<br/>per user)"]
    MODEL_A["Model A<br/>(GRU neural net)"]
    MODEL_B["Model B<br/>(Collaborative filter)"]

    EVENT_LOG["Event Logging<br/>(clicks, purchases)"]

    BATCH_TRAINING["Batch Training Job<br/>(nightly, 4 hours)"]
    DATA["Training Data<br/>(clicks, purchases<br/>from yesterday)"]

    FEATURES["Feature Store<br/>(user features,<br/>product features)"]

    OFFLINE_RANKING["Offline Ranking<br/>(generate recs<br/>for all users)"]

    METRICS["A/B Metrics<br/>(CTR, conversion)"]

    User --> LB
    LB --> ONLINE

    ONLINE --> CACHE
    ONLINE --> MODEL_A
    ONLINE --> MODEL_B

    ONLINE --> EVENT_LOG

    BATCH_TRAINING .--> DATA
    DATA --> FEATURES
    FEATURES --> BATCH_TRAINING

    BATCH_TRAINING --> OFFLINE_RANKING
    OFFLINE_RANKING --> CACHE

    ONLINE --> METRICS

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Cold Start (New Users)¶

Scenario: New user has no history. Can't use collaborative filtering (no similar users). What to recommend?

Solution: Multi-tier Fallback Strategy

Tier 1: Personalized (requires user history)
  IF user_history.size() > 50 events:
    → Use collaborative filter: "Users like you bought..."
    → User embedding in vector space
    → Find K nearest neighbors in user space
    → Recommend items those neighbors liked
  ELSE:
    → Fallback to Tier 2

Tier 2: Category-based (no history needed)
  IF user.browsing_category is known:
    → Recommend top-selling items in category
    → "Popular in Electronics"
  ELSE:
    → Fallback to Tier 3

Tier 3: Trending (brand new user, no context)
  → Recommend trending items globally
  → "Trending Now"
  → "New Arrivals"
  → Personalize later (Tier 1) once user has 50+ events

Example:
  Day 1 (new user): Trending recommendations
  Day 2 (20 clicks, 1 purchase): Category-based
  Day 7 (50+ events): Can use collaborative filtering
    → "Users like you bought X"
    → "Because you viewed Y category"

🟡 Problem: Scalability (100M+ Users, 10M+ Products = 1T Pairs)¶

Scenario: Can't store recommendations for all user-product pairs (1 trillion = 1TB+ storage). Can't compute at query time (needs seconds, not milliseconds).

Solution: Bucketing + Offline Precomputation

Offline Process (happens every night):

1. Train Model (4-6 hours)
   Input: Yesterday's click/purchase events
   Algorithm: Collaborative filtering (Matrix Factorization) or Neural Network
   Output: User embeddings (128D vector per user), Item embeddings (128D vector per product)

2. Bucketing by User Segment
   Segment users:
     - By geography (US, EU, APAC)
     - By device (mobile, desktop)
     - By user value (whales, regular, new)

   Why? Don't need recs for ALL users every night
   → Generate fresh recs only for active users in each segment
   → Saves 50% compute

3. Generate Recommendations (2-3 hours)
   For each user in batch:
     a) Fetch user embedding (128D)
     b) Compute similarity to all products:
        similarity(user, product) = dot_product(user_vec, product_vec)
     c) Top-10 products = highest similarity scores
     d) Apply re-ranking (see below)
     e) Store in Redis/DB: "recs:{user_id}" = [product_1, product_2, ...]

   Parallelized: 100K users/second across cluster
   Total time: 10M users / 100K per sec = 100 seconds = 1.7 minutes

4. Store in Redis (fast serving)
   KEY: "recs:{user_id}"
   VALUE: [product_1, product_2, product_3, ...] (as JSON)
   TTL: 24 hours (refresh daily)

Online Process (real-time, sub-100ms):

1. User opens app: GET /recommendations?user_id=123
2. Server:
   → HGET cache "recs:123"
   → Cache hit! (99% of users updated daily)
   → Return [product_1, product_2, ...]
   → Response time: <5ms

If user not in cache (new user or cache expired):
  → Fall back to Tier 2 (category-based)
  → Compute on-the-fly (100-200ms acceptable)
  → Update cache: HSET "recs:123" = [...]

Result:
  - 99% of requests served from cache (<5ms)
  - Batch training amortizes cost across 24 hours
  - No real-time model inference needed
  - Scales to 100M users

Solution: Re-ranking After Similarity¶

Initial ranking (by embedding similarity):
  [Product_A (score 0.95), Product_B (0.92), Product_C (0.91), ...]

Apply re-ranking filters:
  1. Diversity: Remove duplicates in category
     → Don't recommend 5 laptop chargers
     → Include variety (3 electronics, 2 books, 2 home, etc.)

  2. Business rules:
     → Boost items with high margin
     → Demote out-of-stock items
     → Enforce minimum diversity

  3. Freshness: Prefer recently updated products
     → New items get small boost
     → Overstocked items get boost

  4. User context: Personalize by session
     → If browsing shoes: boost shoe recommendations
     → If recently purchased camera: recommend camera lenses
       (cross-sell, not competing recommendations)

Final ranking:
  [Laptop (0.95, boosted margin),
   Camera Lens (0.88, cross-sell),
   Book (0.85, diversity),
   Phone Case (0.82, accessories),
   ...]

Solution: Handling Model Versioning & A/B Tests¶

Live serving with multiple models:

User enters app:
  1. Check A/B test assignment
     user_in_test = redis.get("ab_test:{user_id}")

  2. If in test:
     → Serve Model B (treatment)
     20% of users in "treatment"

  3. If not in test:
     → Serve Model A (control, production winner)
     80% of users get proven model

  4. Log impression (which model served)
     → Later, track CTR for each model

Example (A/B test):
  Test starts: Model_Collab vs Model_NeuralNet
  Duration: 7 days, 10% traffic each

  Results:
    Model_Collab: CTR = 2.1%
    Model_NeuralNet: CTR = 2.3%
    → NeuralNet wins (p-value < 0.05)
    → Promote NeuralNet to 100%, retire Collab

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design¶

CREATE TABLE user_events (
  event_id BIGINT PRIMARY KEY,
  user_id BIGINT NOT NULL,
  product_id BIGINT NOT NULL,
  event_type VARCHAR(50),  -- CLICK, PURCHASE, WISHLIST, VIEW
  session_id VARCHAR(255),
  timestamp BIGINT,  -- milliseconds for precision
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_events_user_product_time 
  ON user_events(user_id, product_id, timestamp DESC);

-- Used for training (select events from yesterday)
CREATE INDEX idx_events_created_date 
  ON user_events(DATE(created_at));

CREATE TABLE user_product_interactions (
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  product_id BIGINT NOT NULL REFERENCES products(product_id),
  clicks INT DEFAULT 0,
  purchases INT DEFAULT 0,
  views INT DEFAULT 0,
  view_duration_seconds INT DEFAULT 0,
  last_interaction BIGINT,

  -- Precomputed scores for offline ranking
  similarity_score DECIMAL(4,3),  -- 0-1, from model
  diversity_score DECIMAL(4,3),

  PRIMARY KEY (user_id, product_id)
);

-- Model metadata
CREATE TABLE recommendation_models (
  model_id VARCHAR(255) PRIMARY KEY,
  model_type VARCHAR(100),  -- CF, Content-Based, Neural
  version INT,
  train_date DATE,
  accuracy DECIMAL(5,4),
  precision DECIMAL(5,4),
  recall DECIMAL(5,4),
  status VARCHAR(50),  -- TRAINING, LIVE, ARCHIVED
  promoted_at TIMESTAMP
);

-- A/B test results
CREATE TABLE ab_tests (
  test_id BIGSERIAL PRIMARY KEY,
  control_model_id VARCHAR(255) REFERENCES recommendation_models(model_id),
  treatment_model_id VARCHAR(255) REFERENCES recommendation_models(model_id),
  start_at TIMESTAMP,
  end_at TIMESTAMP,
  control_ctr DECIMAL(5,4),
  treatment_ctr DECIMAL(5,4),
  pvalue DECIMAL(6,5),  -- statistical significance
  winner VARCHAR(50),  -- control, treatment, inconclusive
  created_at TIMESTAMP DEFAULT NOW()
);

Caching Strategy¶

Tier 1: Redis (Hot Cache)

1. Precomputed Recommendations (generated offline)
   Key: "recs:{user_id}"
   Value: {
     products: [product_id_1, product_id_2, ..., product_id_10],
     scores: [0.95, 0.92, 0.91, ...],
     reasons: ["Customers like you", "Popular in...", ...],
     model_version: "v5"
   }
   TTL: 24 hours (refresh nightly)
   Purpose: Sub-5ms serving
   Hit rate: 99% (updated daily for most users)

2. User Embeddings (from trained model)
   Key: "embedding:{user_id}"
   Value: [0.123, -0.456, 0.789, ...] (128D vector)
   TTL: 24 hours
   Purpose: For online fallback (new users, cache miss)

3. Product Embeddings (from trained model)
   Key: "embedding:product:{product_id}"
   Value: [0.456, 0.123, -0.789, ...] (128D vector)
   TTL: 24 hours
   Purpose: For online similarity computation

4. A/B Test Assignments
   Key: "ab_test:{user_id}"
   Value: {test_id: 123, model_id: "model_v5_neural"}
   TTL: 30 days (duration of test)
   Purpose: Consistent model assignment for user

Tier 2: Offline Storage

S3 Bucket: recommendation-models/
  ├─ model_v1_collab_filter.pkl (100MB)
  ├─ model_v5_neural_net.onnx (500MB)
  ├─ embeddings_v5_users.bin (1.5GB)
  ├─ embeddings_v5_products.bin (500MB)

After training nightly:
  1. Train model → 50GB intermediate data (on GPU cluster)
  2. Serialize → model_v6.onnx (500MB)
  3. Upload to S3
  4. Load into Redis for serving
  5. Archive old models (keep last 5 versions)

Training Pipeline (Batch)¶

# Simplified training pseudocode

def train_recommendation_model(training_date):
    # Select data from yesterday
    events = database.query("""
      SELECT user_id, product_id, event_type
      FROM user_events
      WHERE DATE(created_at) = ?
    """, training_date - 1 day)

    # Create user-product interaction matrix
    # (rows=users, columns=products, values=interaction_strength)
    interaction_matrix = build_matrix(events)

    # Algorithm: Matrix Factorization (Collaborative Filtering)
    # Factor matrix into: user_embeddings (100M x 128D) × product_embeddings (10M x 128D)
    # This captures patterns: "users who liked X also liked Y"

    user_embeddings, product_embeddings = matrix_factorization(
        interaction_matrix,
        factors=128,
        iterations=10,
        learning_rate=0.01
    )

    # Validate on held-out test set
    accuracy, precision, recall = evaluate(
        user_embeddings,
        product_embeddings,
        test_events
    )

    # Save model
    model = RecommendationModel(
        user_embeddings,
        product_embeddings,
        metadata={accuracy, precision, recall}
    )
    save_to_s3(f"models/recommendation_v6.pkl", model)

    # Generate recommendations for all users
    for user_id in active_users:
        user_vec = user_embeddings[user_id]
        scores = dot_product(user_vec, product_embeddings)  # 10M products
        top_10 = argsort(scores)[-10:]

        cache.set(f"recs:{user_id}", top_10, ttl=24_hours)

    return model

# Run nightly (11 PM - 3 AM)
schedule.daily(train_recommendation_model)

Monitoring & Alerts¶

Key Metrics:

Model Quality
Click-through rate (CTR %, should improve with new model)
Conversion rate (% of recommendations that lead to purchase)
Precision@10 (are top 10 recs relevant?)
Recall (of total relevant items, how many in top 10?)
Online Serving
Recommendation latency (P95 <100ms target)
Cache hit rate (should be >99%)
Fallback rate (% of requests hitting fallback tier)
Training Health
Training job completion (nightly, should complete <4 hours)
Model convergence (loss decreasing?)
Data quality (unexpected events or bot activity?)
A/B Test Results
Treatment CTR vs control (track during test duration)
Statistical significance (p-value < 0.05)
Sample size (sufficient power to detect difference?)
Business Metrics
Revenue per user (recommendations drive sales)
Diversity of recommendations (not all same category)
User engagement (time on site, return rate)

- alert: ModelTrainingFailed
  expr: model_training_status == FAILED
  annotations: "Training job failed — check data pipeline"

- alert: CTRRegression
  expr: current_ctr < baseline_ctr * 0.95
  annotations: "CTR dropped 5% — new model underperforming"

- alert: ServingLatencyHigh
  expr: recommendation_latency_p95 > 200
  annotations: "Rec latency > 200ms — check cache, model size"

- alert: CacheHitRateLow
  expr: cache_hit_rate < 0.90
  annotations: "Cache hit < 90% — precomputation not covering users"

- alert: ColdStartFallback
  expr: fallback_tier_rate > 0.20
  annotations: "20% requests using fallback — many new/inactive users"

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

Offline batch training — Model trained nightly, not real-time (too slow)
Precomputed recommendations — Store in cache, serve <5ms, not computed on-demand
Multi-tier fallback — Personalized → Category → Trending for cold start
Bucketing by user segment — Don't generate recs for ALL users daily (optimize)
Re-ranking for diversity — Similarity scores top-10, then filter for variety
A/B testing framework — Validate improvements before rolling out

Algorithm Comparison¶

Algorithm	Data Needed	Latency	Accuracy	Cold Start
Collaborative Filter	User history	1ms (precomputed)	High	Poor (new users)
Content-Based	Product features	1ms (precomputed)	Medium	Good
Hybrid	Both	5ms (blend models)	High	Good
Trending	Global popularity	<1ms	Lower	Good (fallback)

When to Use What¶

Use Case	Algorithm	Why
Returning user	Collaborative Filter	"Users like you also bought"
Browsing category	Content-Based	Similar items in category
New user	Trending + Category	No history available
Cold product	Content-Based	No interaction history
A/B testing	Both models	Measure improvement

Tech Stack¶

Frontend: Show recommendations in sidebar, carousel
Backend: Stateless, cache lookups only
ML Platform: Spark/TensorFlow for batch training
Model Storage: S3 + Redis cache
Database: PostgreSQL (events, interaction matrix)
Monitoring: A/B test dashboards, CTR tracking

🎯 Interview Summary (5 Minutes)¶

Cold start → Multi-tier fallback (personalized → category → trending)
Scalability → Offline batch training nightly, precompute recommendations
Fast serving → Store in Redis, cache hit 99%, sub-5ms response
Bucketing → Segment users by geography/device to reduce compute
Re-ranking → Similarity scores for relevance, diversity filters for variety
A/B testing → Validate improvements, measure CTR impact
Online learning → Log events, retrain daily, update cache every 24 hours

E-commerce Recommendation System¶

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Non-Functional Requirements¶

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints¶

Entity Data Model¶

High-Level Architecture¶

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Cold Start (New Users)¶

🟡 Problem: Scalability (100M+ Users, 10M+ Products = 1T Pairs)¶

Solution: Re-ranking After Similarity¶

Solution: Handling Model Versioning & A/B Tests¶

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design¶

Caching Strategy¶

Training Pipeline (Batch)¶

Monitoring & Alerts¶

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

Algorithm Comparison¶

When to Use What¶

Tech Stack¶

🎯 Interview Summary (5 Minutes)¶

Glossary & Abbreviations¶

⚡ Quick Reference Cheat Sheet¶

🎯 Interview Summary (5 Minutes)¶

Glossary & Abbreviations¶