14 · B-Tree vs LSM Tree — Storage Engine Index Structures
Storage Internals · Topic 14 of 16
Why Storage Engines Matter
The storage engine determines HOW data is physically written and read. The same SQL query can perform radically differently depending on whether B-Tree or LSM-Tree is underneath.
B-Tree
B-Tree (Balanced Tree) is the dominant index structure in relational databases.
graph TD
Root["Root Node [50 | 75]"] --> L["[10 | 25 | 40]"]
Root --> M["[55 | 62 | 70]"]
Root --> R["[80 | 90 | 95]"]
L --> D1["Leaf Pages (actual rows)"]
M --> D2["Leaf Pages"]
R --> D3["Leaf Pages"]
Write path: Find the leaf page → update in-place → split if overflow → write changed pages to WAL first
| Property | B-Tree |
|---|---|
| Read performance | Fast (O log n) |
| Write performance | Moderate (random I/O, page splits) |
| Space efficiency | Good |
| Best for | OLTP, point lookups, range queries |
LSM Tree (Log-Structured Merge Tree)
LSM Tree is optimized for write-heavy workloads by converting random writes to sequential writes.
graph LR
Write["Write"] --> ML["MemTable (in-memory)"]
ML -->|"Flush when full"| L0["L0 SSTables (on disk)"]
L0 -->|"Compaction"| L1["L1 SSTables"]
L1 -->|"Compaction"| L2["L2 SSTables"]
Write path: Write to in-memory MemTable + WAL → flush to SSTable files when MemTable is full → background compaction merges SSTables
Read path: Check MemTable → check Bloom Filter → check SSTables (newest first)
| Property | LSM Tree |
|---|---|
| Write performance | Excellent (sequential I/O) |
| Read performance | Slower (may check multiple SSTables) |
| Space efficiency | Lower (before compaction) |
| Write amplification | High during compaction |
| Read amplification | Mitigated by Bloom Filters |
| Best for | Write-heavy, time-series, logs |
Head-to-Head
| B-Tree | LSM Tree | |
|---|---|---|
| Write latency | Higher (random I/O) | Lower (sequential) |
| Read latency | Lower | Higher without cache |
| Space | Compact | Bloated until compaction |
| Used by | PostgreSQL, MySQL InnoDB, Spanner | Cassandra, RocksDB, DynamoDB, LevelDB |
Cloud Implementations
| DB | Engine | Type |
|---|---|---|
| PostgreSQL | Heap + B-Tree indexes | B-Tree |
| Google Spanner | Custom B-Tree over Colossus | B-Tree |
| Cassandra | SSTable / RocksDB (Stargate) | LSM |
| DynamoDB | B-Tree + LSM hybrid | Both |
| MongoDB | WiredTiger | B-Tree + optional LSM |