Skip to content

14 · B-Tree vs LSM Tree — Storage Engine Index Structures

Storage Internals · Topic 14 of 16


Why Storage Engines Matter

The storage engine determines HOW data is physically written and read. The same SQL query can perform radically differently depending on whether B-Tree or LSM-Tree is underneath.


B-Tree

B-Tree (Balanced Tree) is the dominant index structure in relational databases.

graph TD
    Root["Root Node [50 | 75]"] --> L["[10 | 25 | 40]"]
    Root --> M["[55 | 62 | 70]"]
    Root --> R["[80 | 90 | 95]"]
    L --> D1["Leaf Pages (actual rows)"]
    M --> D2["Leaf Pages"]
    R --> D3["Leaf Pages"]

Write path: Find the leaf page → update in-place → split if overflow → write changed pages to WAL first

Property B-Tree
Read performance Fast (O log n)
Write performance Moderate (random I/O, page splits)
Space efficiency Good
Best for OLTP, point lookups, range queries

LSM Tree (Log-Structured Merge Tree)

LSM Tree is optimized for write-heavy workloads by converting random writes to sequential writes.

graph LR
    Write["Write"] --> ML["MemTable (in-memory)"]
    ML -->|"Flush when full"| L0["L0 SSTables (on disk)"]
    L0 -->|"Compaction"| L1["L1 SSTables"]
    L1 -->|"Compaction"| L2["L2 SSTables"]

Write path: Write to in-memory MemTable + WAL → flush to SSTable files when MemTable is full → background compaction merges SSTables

Read path: Check MemTable → check Bloom Filter → check SSTables (newest first)

Property LSM Tree
Write performance Excellent (sequential I/O)
Read performance Slower (may check multiple SSTables)
Space efficiency Lower (before compaction)
Write amplification High during compaction
Read amplification Mitigated by Bloom Filters
Best for Write-heavy, time-series, logs

Head-to-Head

B-Tree LSM Tree
Write latency Higher (random I/O) Lower (sequential)
Read latency Lower Higher without cache
Space Compact Bloated until compaction
Used by PostgreSQL, MySQL InnoDB, Spanner Cassandra, RocksDB, DynamoDB, LevelDB

Cloud Implementations

DB Engine Type
PostgreSQL Heap + B-Tree indexes B-Tree
Google Spanner Custom B-Tree over Colossus B-Tree
Cassandra SSTable / RocksDB (Stargate) LSM
DynamoDB B-Tree + LSM hybrid Both
MongoDB WiredTiger B-Tree + optional LSM