Uber Backend¶

Interview Time: 60-90 min | Difficulty: Hard
Key Focus: Real-time location tracking, matching algorithm, surge pricing, payment consistency

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Users (passenger/driver) register and create profile
Passengers can request ride (with pickup/dropoff location)
System matches nearby drivers to passenger requests
Drivers accept/decline ride requests
Real-time location tracking during ride
Passenger and driver can cancel rides (with fees if close to pickup)
Payment processing at end of ride
Trip history and ratings
Surge pricing based on demand
Driver status (online, offline, on-ride, completing ride)

Non-Functional Requirements¶

Requirement	Target	Notes
Scale	100M users, 10M daily active drivers	Peak: 1M concurrent drivers online
Latency	Match in <30 sec, location update <1 sec	User-facing latency critical
Availability	99.95% uptime	Downtimes cost revenue
Consistency	Strong for payment, eventual for location	No double-charging, eventually accurate location
Throughput	1000 requests/sec, 10M location updates/sec	Location updates asynchronous

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints¶

POST /passengers/request-ride
  {passenger_id, pickup: {lat, lng}, dropoff: {lat, lng}, ride_type: UberX|UberXL}
  → {ride_request_id, estimated_price, eta_in_seconds, surge_multiplier}

GET /ride-requests/{ride_request_id}/status
  → {status: MATCHING|ASSIGNED|IN_PROGRESS|COMPLETED, driver_id?, location?, eta?}

PUT /ride-requests/{ride_request_id}/cancel
  {reason: CHANGED_MIND|DRIVER_NOT_HERE}
  → {cancellation_fee: decimal, refund_amount: decimal}

POST /drivers/location-update
  {driver_id, location: {lat, lng, bearing}, timestamp}
  → {status: ACK, battery_level?, online_status?}

POST /drivers/ride-requests/{ride_request_id}/accept
  {driver_id}
  → {success: true/false, ride_id, passenger_info}

POST /rides/{ride_id}/complete
  {driver_id, final_location: {lat, lng}, final_price}
  → {ride_id, payment_status: SUCCEEDED, receipt}

GET /drivers/nearby-requests
  {driver_id, location: {lat, lng}, radius_km: 5}
  → {requests: [{ride_request_id, pickup, dropoff, estimated_price, surge}]}

Entity Data Model¶

PASSENGERS
├─ user_id (PK)
├─ phone, email, name, rating (avg stars)
├─ payment_methods (JSON: [card, wallet])
├─ ride_history_count, created_at

DRIVERS
├─ driver_id (PK)
├─ phone, email, name, vehicle_info (JSON)
├─ rating, trips_completed, status (ONLINE|OFFLINE|ON_RIDE)
├─ documents: {license_url, insurance_url, background_check: bool}
├─ created_at

DRIVER_LOCATIONS (ephemeral, hot data)
├─ driver_id (PK)
├─ location (GEOGRAPHY: point, indexed for spatial queries)
├─ bearing (direction), accuracy
├─ timestamp, server_timestamp

RIDE_REQUESTS
├─ ride_request_id (PK)
├─ passenger_id (FK), driver_id (FK, nullable until matched)
├─ pickup_location (GEOGRAPHY: point)
├─ dropoff_location (GEOGRAPHY: point)
├─ ride_type (UberX, UberXL)
├─ status (MATCHING|ASSIGNED|IN_PROGRESS|COMPLETED|CANCELLED)
├─ base_fare (decimal), surge_multiplier (float)
├─ created_at, completed_at, cancelled_at

RIDES (completed trips)
├─ ride_id (PK)
├─ ride_request_id (FK) — denormalized for history
├─ passenger_id (FK), driver_id (FK)
├─ pickup_location, dropoff_location
├─ actual_distance_km (calculated), duration_seconds
├─ base_fare, surge_multiplier, tip, tax
├─ total_price, payment_status (PENDING|SUCCEEDED|FAILED)
├─ payment_method_id (FK)
├─ passenger_rating (1-5), driver_rating (1-5)
├─ completed_at

RATINGS
├─ ride_id (FK), rater_id (FK)
├─ rating (1-5), comment (text)
├─ created_at

SURGE_PRICING_METRICS (for demand-based pricing)
├─ region_id (geo-hash), timestamp (minute-level)
├─ requests_pending (count), drivers_online (count)
├─ surge_multiplier (1.0 - 5.0)
├─ updated_at

High-Level Architecture¶

graph TB
    Passenger["📱 Passenger App"]
    Driver["📱 Driver App"]
    LB["Load Balancer"]

    RIDE_REQUEST["Ride Request Service"]
    MATCHING["Matching Engine<br/>(Redis, real-time)"]
    LOCATION["Location Service<br/>(WebSocket, streaming)"]
    PAYMENT["Payment Service<br/>(Stripe/PayPal)"]
    RATING["Rating Service"]

    CACHE["Redis Cluster<br/>(locations, active requests,<br/>session state)"]
    GEO_INDEX["PostgreSQL w/<br/>PostGIS Extension<br/>(spatial indexing)"]
    HISTORY_DB["NoSQL DB<br/>(trip history)"]

    KAFKA["Kafka<br/>(location stream,<br/>ride events)"]
    SURGE_JOB["Surge Pricing Job<br/>(batch every 5 min)"]

    Passenger -->|Request Ride| LB
    Driver -->|Location Updates| LB

    LB --> RIDE_REQUEST
    LB --> LOCATION
    LB --> PAYMENT
    LB --> RATING

    RIDE_REQUEST --> MATCHING
    RIDE_REQUEST --> GEO_INDEX

    MATCHING --> CACHE
    LOCATION --> KAFKA
    LOCATION --> CACHE

    GEO_INDEX --> HISTORY_DB
    PAYMENT --> KAFKA

    KAFKA --> SURGE_JOB
    SURGE_JOB --> CACHE

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Race Condition on Ride Acceptance¶

Scenario: Passenger requests ride. System matches to 3 drivers at once (for redundancy). All 3 drivers accept within 100ms. System assigns to first, but other 2 don't get notified.

Solution: Distributed Lock on Ride Request

1. Ride request enters MATCHING state
2. Matching engine finds 3 drivers (within 2km, high rating)
3. Push notification sent to all 3 drivers
4. Driver 1 hits "Accept" button
   → HTTP POST /ride-requests/{id}/accept

5. [CRITICAL SECTION]
   SET lock (atomic operation in Redis)
   lock_key: "ride:{id}:acceptance_lock"
   value: driver_1_id
   TTL: 5 seconds

   If SET succeeds:
     → Driver 1 acquires lock
     → Update ride_request.driver_id = driver_1_id, status = ASSIGNED
     → Send "ASSIGNED" to driver 1 (websocket)
     → Send "RIDE_TAKEN" to drivers 2,3 (websocket)
     → Send ETA to passenger (websocket)
     → RETURN success to driver 1

   If SET fails (lock already held):
     → Another driver's acceptance in progress
     → RETURN error: "Ride already accepted by another driver"
     → Client shows toast: "This ride was matched to another driver"
     → Driver 2/3 removed from the ride request queue

6. After acceptance, send location-only updates (no more driver search)

Why Redis SET NX (Not eXists)? - Atomic: No race between check and set - Sub-millisecond: Avoids stale data between servers - Auto-expiry: If handler crashes, lock releases in 5 sec (driver app retries)

🟡 Problem: Double Charging in Payment Race Condition¶

Scenario: Driver ends ride. System processes payment. Simultaneously, passenger cancels ride (flaky network sends both requests). System charges twice.

Solution: Idempotent Payment Processing

Payment request includes:
  {ride_id, driver_id, passenger_id, amount, timestamp, idempotency_key}

idempotency_key = SHA256(
  ride_id + 
  payment_method_id + 
  amount + 
  timestamp_to_minute
)

Payment Service cache (Redis):
  KEY: idempotency_key
  VALUE: {payment_id, status, amount, timestamp}
  TTL: 24 hours

Sequence:
1. Drive ends, POST /rides/{id}/complete
2. Payment service generates idempotency_key
3. Check Redis: "idempotency_key" exists?

   YES (duplicate request):
     → Return cached payment_id
     → Log warning (duplicate detected)
     → No new charge

   NO (first time):
     → Call Stripe API with idempotency_key
     → Stripe also checks idempotency (Stripe deduplicates on its end)
     → Cache result in Redis
     → Return {payment_id, status: SUCCEEDED}

Solution: Consistency Levels by Data Type¶

Data	Consistency	Strategy
Ride acceptance	Strong	Redis distributed lock
Driver location	Eventual OK	Async Kafka, eventual DB write
Payment	Strong	Idempotent + external gateway
Ride completion	Strong ACID	DB transaction
Surge pricing	Eventual OK	Batch job every 5 minutes
Ratings	Eventual OK	Async processing

Scalability: Handling 10M Location Updates/sec¶

Problem: Each driver sends location every 5-10 seconds. 1M drivers × 1 update/5sec = 200K updates/sec.

Solution: Multi-tier Buffering

Driver phone:
  → Batch 5 location updates into 1 message
  → Send every 10 seconds (not every second)
  → Reduces payload 5× (from 200K/sec to 40K/sec)

Location Service (stateless, auto-scaled):
  → Receive location messages
  → Write to Kafka (async, fire-and-forget)
  → Return ACK immediately (latency <50ms)
  → NO direct DB write (would bottleneck)

Kafka (high throughput):
  → Buffer: 10M messages/sec
  → Partition by driver_id (keeps driver's location stream ordered)

Stream Processor:
  → Consume Kafka stream
  → Aggregate: last location per driver
  → Write to Redis (hot cache) — O(1) update
  → Write batches to PostgreSQL (30-sec batches)
    → Reduces 200K writes/sec to 7K batches/sec

Redis (cached locations):
  → Available for immediate queries
  → "Where are nearby drivers?" — Redis geo-radius in <10ms
  → Refresh every 30 seconds from stream processor

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design¶

-- Passengers & Drivers (write-once, slow-moving data)
CREATE TABLE passengers (
  user_id BIGSERIAL PRIMARY KEY,
  phone VARCHAR(20) UNIQUE,
  email VARCHAR(255),
  name VARCHAR(255),
  rating DECIMAL(3,2),
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE drivers (
  driver_id BIGSERIAL PRIMARY KEY,
  phone VARCHAR(20) UNIQUE,
  email VARCHAR(255),
  name VARCHAR(255),
  vehicle_id VARCHAR(50),
  rating DECIMAL(3,2),
  status ENUM('ONLINE', 'OFFLINE', 'ON_RIDE'),
  created_at TIMESTAMP DEFAULT NOW()
);

-- Live Driver Locations (high-volume, hot data)
-- Use separate PostgreSQL instance with PostGIS extension
CREATE TABLE driver_locations (
  driver_id BIGINT PRIMARY KEY REFERENCES drivers(driver_id),
  location GEOGRAPHY(POINT, 4326),  -- Lat/Lng with spatial index
  bearing INT,  -- 0-359 degrees
  accuracy INT,  -- meters
  timestamp BIGINT,  -- milliseconds
  updated_at TIMESTAMP DEFAULT NOW()
);

-- PostGIS spatial index for fast geo queries
CREATE INDEX idx_driver_locations_geo 
  ON driver_locations USING GIST(location);

-- Ride Requests (transactional, strong ACID)
CREATE TABLE ride_requests (
  ride_request_id BIGSERIAL PRIMARY KEY,
  passenger_id BIGINT NOT NULL REFERENCES passengers(user_id),
  driver_id BIGINT REFERENCES drivers(driver_id),
  pickup_location GEOGRAPHY(POINT, 4326),
  dropoff_location GEOGRAPHY(POINT, 4326),
  ride_type VARCHAR(20),  -- UberX, UberXL
  status VARCHAR(20),  -- MATCHING, ASSIGNED, IN_PROGRESS, COMPLETED, CANCELLED
  base_fare DECIMAL(8,2),
  surge_multiplier DECIMAL(3,2) DEFAULT 1.0,
  created_at TIMESTAMP DEFAULT NOW(),
  matched_at TIMESTAMP,
  completed_at TIMESTAMP
);

CREATE INDEX idx_ride_requests_status_created 
  ON ride_requests(status, created_at DESC);

-- Ride History (immutable log, denormalized for performance)
CREATE TABLE rides (
  ride_id BIGSERIAL PRIMARY KEY,
  ride_request_id BIGINT UNIQUE REFERENCES ride_requests(ride_request_id),
  passenger_id BIGINT NOT NULL REFERENCES passengers(user_id),
  driver_id BIGINT NOT NULL REFERENCES drivers(driver_id),
  pickup_location GEOGRAPHY(POINT, 4326),
  dropoff_location GEOGRAPHY(POINT, 4326),
  actual_distance_km DECIMAL(6,2),
  duration_minutes INT,
  base_fare DECIMAL(8,2),
  surge_multiplier DECIMAL(3,2),
  tip DECIMAL(8,2) DEFAULT 0,
  tax DECIMAL(8,2),
  total_price DECIMAL(8,2),
  payment_status VARCHAR(20),  -- SUCCEEDED, FAILED, REFUNDED
  payment_id VARCHAR(255),
  completed_at TIMESTAMP,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_rides_passenger_created 
  ON rides(passenger_id, created_at DESC);
CREATE INDEX idx_rides_driver_created 
  ON rides(driver_id, created_at DESC);

-- Ratings (slow-moving, eventual consistency OK)
CREATE TABLE ratings (
  rating_id BIGSERIAL PRIMARY KEY,
  ride_id BIGINT REFERENCES rides(ride_id),
  rater_id BIGINT,  -- passenger or driver
  ratee_id BIGINT,  -- driver or passenger
  rating INT,  -- 1-5
  comment TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_ratings_ride_id ON ratings(ride_id);
CREATE INDEX idx_ratings_ratee_id ON ratings(ratee_id);

Caching Strategy¶

Tier 1: Redis (Hot Cache)

1. Driver Locations (Geo-indexed)
   Key: "driver:locations" (Redis sorted set with geospatial index)
   Structure: GEOADD driver:locations {lon} {lat} {driver_id}
   Query: GEORADIUS driver:locations {lon} {lat} {radius_km} KMWITHDIST
   TTL: 60 seconds (refresh from stream processor)
   Purpose: Fast "Find nearby drivers" queries (<10ms)

2. Active Ride Requests (for matching)
   Key: "ride:requests:matching"
   Value: {ride_request_id: {pickup, dropoff, surge_mult, created_ts}}
   TTL: 5 minutes (removed once assigned or expired)
   Purpose: Matching engine queries for unmatched requests

3. Driver Status + Basic Info
   Key: "driver:{driver_id}:status"
   Value: {status: ON_RIDE|ONLINE|OFFLINE, location, current_ride_id}
   TTL: 30 seconds
   Purpose: Fast status checks without DB query

4. Ride Acceptance Locks (time-limited)
   Key: "ride:{ride_request_id}:acceptance_lock"
   Value: {driver_id, timestamp}
   TTL: 5 seconds (auto-expire if handler crashes)
   Purpose: Prevent race condition on ride acceptance

Tier 2: Database

PostgreSQL (ride/passenger/driver data)
PostGIS extension for spatial queries on historical data
Archive old locations to compressed storage (>30 days: S3)

Real-Time Communication: WebSocket¶

Client connections: - Passenger: Listens for driver location, ETA updates - Driver: Listens for new ride requests, passenger cancellation

Server broadcast:

On location update (Kafka stream triggers):
  → Get all passengers with active rides
  → For each passenger, push location to WebSocket connection
  → Message: {driver_location, eta_minutes, updated_at}

On ride cancellation:
  → Push "RIDE_CANCELLED" to driver
  → Driver immediately available again

Monitoring & Alerts¶

Key Metrics:

Ride Matching
Average match time (should be <30s for 95th percentile)
Match success rate (% of requests that get matched)
Drivers available vs pending requests ratio
Payment Processing
Payment success rate (% > 99.5%)
Duplicate payment incidents (should be 0)
Average payment latency
Driver Utilization
% of drivers online
Average rides per driver per day
Acceptance rate (% of offered rides drivers accept)
Customer Experience
Ride cancellation rate by stage (during matching, after assignment, after pickup)
Rating distribution (avg rating > 4.7)
Support tickets (payment disputes, safety issues)
System Health
Location update latency (P95 <1 second)
WebSocket connection stability
Cache hit rate for driver locations (should be >95%)
PostGIS query performance (<50ms for geo-radius)

- alert: MatchSuccessRateLow
  expr: match_success_rate < 0.85
  annotations: "Match rate dropped below 85% — too few drivers online?"

- alert: PaymentFailureRate
  expr: payment_failure_rate > 0.005
  annotations: "Payment failures > 0.5% — investigate payment gateway"

- alert: LocationUpdateLatencyHigh
  expr: location_update_p95 > 2000
  annotations: "Location latency > 2s — Kafka or stream processor bottleneck"

- alert: DuplicatePaymentDetected
  expr: duplicate_payments_per_min > 0
  annotations: "Duplicate payment detected — review idempotence logic"

⚡ Quick Reference Cheat Sheet¶

Critical Design Decisions¶

Redis lock on ride acceptance — Prevents multiple drivers accepting same ride
Idempotent payment processing — Prevents double-charging on flaky networks
Kafka for location stream — Decouples real-time location from write to DB
PostGIS spatial indexes — Sub-50ms geo-radius queries for matching
Eventual consistency for locations — OK because location refreshes every 5-10 seconds
WebSocket for real-time updates — Push notifications for location/ETA without polling

When to Use What¶

Need	Technology	Why
Find drivers nearby	PostGIS geo-index + Redis cache	Sub-50ms query for matching
Match drivers to requests	Redis lock + Kafka stream	At-most-once semantics
Process payment	Stripe + idempotency key	Deduplicates retries
Stream locations	Kafka + buffer 5 updates	Handles 10M updates/sec
Real-time ETA/location	WebSocket	Push vs pull reduces latency
Driver status consensus	Redis + 30s TTL	Eventual consistency acceptable

Tech Stack¶

Frontend: React Native (iOS/Android)
Backend: Python/Go (stateless, auto-scaled)
Matching Engine: Go (low-latency, real-time)
Databases:
  - PostgreSQL + PostGIS (rides, passengers, drivers)
  - Separate PostgreSQL instance (driver locations, high-volume)
  - Redis cluster (cache, locks, geo-index)
  - NoSQL (trip history archive)
Streaming: Kafka (high-throughput location processing)
Real-time: WebSocket (location/ETA push)
Payment: Stripe API (idempotent)
Monitoring: Prometheus + Grafana

🎯 Interview Summary (5 Minutes)¶

Ride acceptance race condition → Redis distributed lock (SET NX with TTL)
Double charging problem → Idempotent payment with idempotency_key + cache
Matching latency → PostGIS spatial index + Redis geo-cache (sub-50ms)
10M location updates/sec → Kafka stream + batching (driver sends every 10s, not every sec)
Real-time location to passenger → WebSocket push (not polling)
Strong consistency → Payment gateway (Stripe handles idempotence), DB transactions for rides
Eventual consistency → Driver locations (refreshed every 30s), ratings