Hotel Booking Service¶

Interview Time: 45 min | Difficulty: Hard
Key Focus: Concurrency, inventory management, double-booking prevention, search scalability

Step 1: Functional & Non-Functional Requirements¶

Functional Requirements¶

Users can search for available hotels by location, date range, and filters
Users can book a room (reserve with payment)
Users can cancel a reservation (full or partial refund)
Users can view booked history and current reservations
Hosts/hotels can manage inventory (add rooms, set pricing, update availability)
Support different room types and prices
Notification on booking confirmation and cancellation

Non-Functional Requirements¶

Requirement	Target	Notes
Throughput	10M users, 50K bookings/day	Peak: 500 QPS during holidays
Latency	Search <200ms, Booking <500ms
Availability	99.95% uptime	Booking failures acceptable (retry)
Consistency	Strong (no overbooking)	Weak for historical data
Data Retention	5 years for billing, 1 year for activity logs

Step 2: API Design, Data Model & High-Level Design¶

Core API Endpoints¶

# Search
GET /search
  ?location=NYC&check_in=2025-01-15&check_out=2025-01-20&guests=2&room_type=STANDARD

# Booking  
POST /bookings
  {hotel_id, room_id, check_in, check_out, guest_info, payment_method}
  → BookingId + confirmation

# Cancel
DELETE /bookings/{booking_id}
  → Processes refund, frees inventory

# Retrieve
GET /bookings/{booking_id}
  → Booking details + current status

Entity Data Model¶

HOTELS
├─ hotel_id (PK)
├─ name, location, address
├─ rating, amenities (array)
├─ created_at, updated_at

ROOMS
├─ room_id (PK)
├─ hotel_id (FK), room_type, capacity
├─ base_price, currency

AVAILABILITY (Time-Series)
├─ room_id (PK), date (PK), price
├─ inventory_count, booked_count
├─ updated_at

BOOKINGS
├─ booking_id (PK), user_id, room_id, hotel_id
├─ check_in (PK), check_out, status (CONFIRMED/CANCELLED)
├─ total_price, payment_status
├─ created_at, updated_at

PAYMENTS
├─ payment_id (PK), booking_id (FK)
├─ amount, status (PENDING/SUCCESS/FAILED/REFUNDED)
├─ payment_method, transaction_id

High-Level Architecture¶

graph TB
    User["👤 User/Browser"]
    LB["Load Balancer"]

    WEB["Web API<br/>(Node.js/Java)"]
    SEARCH["Search Service<br/>(Elasticsearch)"]
    BOOKING["Booking Service<br/>(with Lock Manager)"]
    PAYMENT["Payment Service<br/>(External API)"]

    CACHE["Redis Cache<br/>(Hotel · Availability)"]
    QUEUE["Message Queue<br/>(RabbitMQ/Kafka)"]

    DB["Primary DB<br/>(PostgreSQL)"]
    REPLICA["Read Replica<br/>(Analytics)"]

    NOTIFY["Notification Service<br/>(Email/SMS/Push)"]
    MONITORING["Monitoring<br/>(Prometheus + Grafana)"]

    User --> LB
    LB --> WEB

    WEB --> CACHE
    WEB --> SEARCH
    WEB --> BOOKING

    BOOKING --> QUEUE
    BOOKING --> DB

    PAYMENT -->|Async| QUEUE
    QUEUE --> NOTIFY
    QUEUE --> REPLICA

    DB --> REPLICA
    WEB --> MONITORING

Step 3: Concurrency, Consistency & Scalability¶

🔴 Problem: Double Booking¶

Scenario: Two users simultaneously click "Book Room 101 for Jan 15-17" when only 1 room is available.

Solutions (from weakest to strongest):

Approach	Implementation	Pros	Cons
Optimistic Lock	Version field in AVAILABILITY row	Simple, fast	Retries on conflict (lost bookings)
Pessimistic Lock	`SELECT ... FOR UPDATE` before booking	Guaranteed correctness	Slower (holds locks)
Row-Level Lock	Database constraint (unique booking timestamp)	Consistent	Requires good DB design
Distributed Lock	Redis SET NX or Zookeeper	Works across services	Complexity, deadlock risk

Recommended: Pessimistic Lock FOR SMALL INVENTORY (rooms <1000). For larger inventories, Optimistic Lock + Retry.

-- Pessimistic (Transaction)
BEGIN;
SELECT inventory_count FROM availability 
  WHERE room_id = ? AND date = ? AND day_of_checkout = ?
  FOR UPDATE;

IF inventory_count > 0 THEN
    UPDATE availability SET inventory_count = inventory_count - 1;
    INSERT INTO bookings (...);
    COMMIT;
END IF;

🟡 Problem: Race Condition in Refunds¶

Scenario: Cancellation processed twice → double refund.

Solution: Idempotency keys

POST /bookings/{booking_id}/cancel
  Header: Idempotency-Key: <UUID>

Service checks: If Idempotency-Key exists + CANCELLED → return cached response
If new → process once, cache result

🟢 Scalability: Availability Search¶

Problem: Searching across millions of hotel-date combinations is slow.

Solution: Denormalized Availability Cache + Elasticsearch

AVAILABILITY_DENORM (cached in Redis/Memcached):
  Key: hotel:NYC:2025-01:STANDARD → [dates with available rooms]
  TTL: 1 hour (refresh on booking)

Or: Elasticsearch index for faceted search
  Index: hotels_availability_2025_01
  Shard by: (location, hotel_id, room_type)

💾 Data Consistency Strategy¶

Data Type	Consistency	Strategy
Bookings	Strong ACID	Transactions, duplicates = error
Availability	Eventual	Cache + async reconciliation
User Profiles	Eventual	Read replicas OK
Analytics	Weak	Batch jobs (Spark, daily)

Step 4: Persistence Layer, Caching & Monitoring¶

Database Design¶

Primary Database: PostgreSQL

-- Indexes critical for performance
CREATE INDEX idx_availability_search 
  ON availability(hotel_id, check_in_date, room_type, inventory_count);

CREATE INDEX idx_bookings_user 
  ON bookings(user_id, created_at DESC);

-- Partitioning by date (reduces scan size)
CREATE TABLE availability_2025_01 PARTITION OF availability
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Replica for Analytics (not real-time):

Primary (write) ← PostgreSQL Master
└─ Replica (read analytics) ← PostgreSQL Standby
   └─ [Every 6 hrs] → Data Warehouse (Snowflake/BigQuery)

Caching Strategy¶

TIER 1: HTTP Cache (CDN)
├─ Static: Hotel listing metadata (TTL: 1 day)
└─ Dynamic: Search results (TTL: 5 min)

TIER 2: Application Cache (Redis)
├─ hotel:{id} → hotel details (TTL: 1 hr)
├─ availability:{room_id}:{date} → count + price (TTL: 30 sec)
└─ user_bookings:{user_id} → [bookings] (TTL: 10 min)

TIER 3: Database Query Cache
├─ Result-set caching (expensive queries)
└─ Prepared statements (reduced parse overhead)

Invalidation:
- On booking → invalidate availability:{room_id}:{all_dates_in_range}
- On price update → invalidate hotel:{id}
- Backfill missing data on cache miss (cache-aside pattern)

Search Indexing (Elasticsearch)¶

{
  "index": "hotels_search",
  "mappings": {
    "properties": {
      "hotel_id": { "type": "keyword" },
      "location": { "type": "geo_point" },
      "name": { "type": "text", "analyzer": "standard" },
      "amenities": { "type": "keyword" },
      "rating": { "type": "float" },
      "min_price": { "type": "integer" },
      "availability_2025_01": { "type": "sparse_vector" }
    }
  }
}

-- Search query (filtered by check-in date)
GET /hotels/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "location.city": "NYC" } },
        { "range": { "rating": { "gte": 4.0 } } }
      ],
      "filter": [
        { "term": { "availability_2025_01.2025-01-15": 1 } }
      ]
    }
  }
}

Monitoring & Alerts¶

Key Metrics to Track:

Booking Success Rate — % of bookings completed vs. attempted
Double-Booking Incidents — Should be ~0 (audit daily)
Search Latency p99 — Target <200ms
Cache Hit Ratio — Should be >80% for availability
Payment Failure Rate — Track failed payments, retries
Notification Delivery — Email open rate, SMS delivery rate

# Prometheus alerts
- alert: BookingSuccessRateLow
  expr: booking_success_rate < 0.95
  for: 5m
  annotations: "Booking success rate below 95%"

- alert: DoubleBookingDetected
  expr: double_booking_count > 0
  for: 1m
  annotations: "Double booking detected - investigate immediately"

- alert: SearchLatencyHigh
  expr: search_latency_p99 > 200
  annotations: "Search latency exceeding SLA"

⚡ Quick Reference Cheat Sheet¶

When to Use What¶

Need	Technology	Why
Strong consistency	Pessimistic lock + Transaction	No double bookings
Availability search fast	Elasticsearch + date index	Sub-100ms faceted search
Real-time inventory	Redis cache (short TTL)	Instant updates on booking
Historical queries	Read replica + data warehouse	Don't slow down production
Geographic search	Geospatial index (PostGIS)	Nearby hotels efficiently

Critical Design Decisions¶

Booking must be transactional (strong consistency) — use pessimistic lock or distributed lock
Search can be eventually consistent — cache availability, refresh on change
Cancellation must be idempotent — use Idempotency-Key header
High-traffic periods → Implement booking queue (asynchronous processing) with SLA
Notifications → async (don't block booking response); retry on failure

Tech Stack Summary¶

Frontend: React + TypeScript
Backend: Java/Node.js (stateless)
Database: PostgreSQL (primary) + Replica
Cache: Redis
Search: Elasticsearch
Queue: RabbitMQ/Kafka
Payment: Stripe/PayPal (API)
Notifications: SendGrid (email), Twilio (SMS)
Monitoring: Prometheus + Grafana
Logging: ELK Stack

🎯 Interview Summary (5 Minutes)¶

If you have only 5 minutes before the interview, memorize this:

Clear requirements: Search must be fast, booking must prevent overbooking
Prevent double bookings: Use transactions + pessimistic lock on inventory
Make search fast: Denormalized availability cache (Redis) + Elasticsearch
Handle cancellations safely: Idempotency-Key in headers
Scale read traffic: Use read replicas for non-booking queries
Monitor closely: Track booking success rate and double-booking attempts (should be 0)