Database Requirements Estimation

2 minute read

1. Data Size per User

  • User Profile Data:
    • 1KB - 10KB per user (depends on complexity).
  • Activity Logs:
    • 100KB - 500KB per user/day (depending on app type and usage).
  • Media (e.g., images, videos):
    • 100KB - several MB per user (depending on the app).
  • Example: A social media platform with 10M DAUs, each generating 100KB of data/day results in 1TB/day of data.

2. Traffic Volume (API Calls)

  • API Calls per User per Day:
    • 30 - 100 API calls per user per day (typical range, depends on the app).
  • Total API Calls per Day:
    • DAU of 1M users with 50 API calls/user/day results in 50M API calls/day.
    • For a 10M DAU app, that’s 500M API calls/day.
  • Reads vs Writes:
    • Reads: 70%-90% of operations in typical apps.
    • Writes: 10%-30% of operations (may vary by app).
  • Example: If 50M API calls/day are made and 20% are writes, that’s 10M writes/day.

3. Storage Requirements

  • Data Growth:
    • Estimate 10%-20% monthly growth in data (adjust as per the type of app and stage of company).
  • Storage per User:
    • User-related data: 1KB - 10KB per user.
    • Media-heavy apps: 100KB - several MB per user.
  • Example:
    • For 10M DAUs with 1KB per user profile data and 100KB for activity logs, total storage could be:
      • User data: 10M users * 1KB = 10GB.
      • Activity logs: 10M users * 100KB = 1TB of logs per day.
  • Total Storage Estimate: 1TB - 10TB/day for a platform with heavy activity.

4. Database Operations

  • Reads: 70% - 90% of operations.
  • Writes: 10% - 30% of operations.
  • Write-heavy applications (e.g., logging services) will have higher write percentages.
  • Example: For 50M API calls/day with 80% reads and 20% writes, that results in:
    • Reads: 40M reads/day.
    • Writes: 10M writes/day.

5. Sharding and Partitioning

  • Sharding Strategy: Split data across multiple databases based on user ID, region, or other logical partitioning methods.
  • Example:
    • A database shard might handle 1M - 10M users, or it could be based on geographic regions.
    • For 50M DAUs, you might have 5-10 shards, each handling 5M-10M users.

6. Replication and Availability

  • Primary and Replica DBs: Use primary-replica databases to scale reads.
    • Replicas: Typically 2-5 replicas for high availability.
  • Example: For a system with 50M API calls/day and heavy read traffic, replicas help distribute the load across multiple databases.

7. Indexes

  • Indexing for Speed: Commonly used for fast querying (e.g., user IDs, timestamps).
  • Trade-offs: Indexes improve read performance but can degrade write performance.
  • Example: Indexing user IDs for a user profile look-up or timestamps for activity logs.

8. Backup and Disaster Recovery

  • Backup Strategy: Regular backups (e.g., hourly, daily) to ensure data recovery in case of failure.
  • Example: For an e-commerce platform with 50M DAUs and heavy transactions, backups might be taken every 30 minutes to 1 hour.

General Tips for Database Requirements Estimation

  • Assume 10%-20% Growth in Data: Always account for data growth, especially in rapidly growing apps.
  • Use 70%-90% Reads and 10%-30% Writes: As a general rule, reads typically outnumber writes unless your app is very write-heavy (e.g., logging systems).
  • Consider the App Type: For media-heavy apps, storage and data volume will be significantly higher.
  • Factor in Sharding and Partitioning: Plan for horizontal scaling via sharding when DAU grows.