Kafka Deep Dive
Level: Intermediate
Pre-reading: 04 · Event-Driven Architecture · 04.01 · Event Types and Patterns
Kafka Architecture Overview
Apache Kafka is a distributed event streaming platform. It's designed for high throughput, durability, and scalability.
graph TD
subgraph Kafka Cluster
B1[Broker 1]
B2[Broker 2]
B3[Broker 3]
end
P1[Producer 1] --> B1
P2[Producer 2] --> B2
C1[Consumer Group A] --> B1
C1 --> B2
C2[Consumer Group B] --> B2
C2 --> B3
ZK[Zookeeper / KRaft] --> B1
ZK --> B2
ZK --> B3
Core Concepts
Topics and Partitions
A topic is a named stream of events. Topics are split into partitions — ordered, immutable logs.
graph TD
T[Topic: orders]
T --> P0[Partition 0]
T --> P1[Partition 1]
T --> P2[Partition 2]
P0 --> M0A[Offset 0]
P0 --> M0B[Offset 1]
P0 --> M0C[Offset 2]
P1 --> M1A[Offset 0]
P1 --> M1B[Offset 1]
| Concept | Description |
|---|---|
| Topic | Logical channel; named stream |
| Partition | Physical unit; ordered log; parallelism unit |
| Offset | Position of message in partition; monotonically increasing |
| Replication Factor | Number of copies across brokers |
Partition Key
The partition key determines which partition a message goes to. Messages with the same key always go to the same partition (ordering guaranteed).
// Same orderId always goes to same partition
producer.send(new ProducerRecord<>(
"orders", // topic
order.getId(), // key (partition key)
orderEvent // value
));
| Key Strategy | Use Case |
|---|---|
| Entity ID | Order ID, Customer ID — events for same entity ordered |
| Random/null | Load distribution; no ordering needed |
| Region/tenant | Shard by geography or tenant |
Producers
Producers write messages to topics.
Producer Configuration
| Config | Purpose | Recommendation |
|---|---|---|
acks |
Acknowledgment level | all for durability |
retries |
Retry count | High (e.g., 10) |
enable.idempotence |
Prevent duplicates | true |
batch.size |
Batch messages | Tune for throughput |
linger.ms |
Wait to batch | 5–10ms for throughput |
Acknowledgment Levels
acks |
Behavior | Durability | Latency |
|---|---|---|---|
0 |
Fire and forget | None | Lowest |
1 |
Leader acknowledged | Medium | Medium |
all |
All replicas acknowledged | Highest | Highest |
Properties props = new Properties();
props.put("bootstrap.servers", "kafka:9092");
props.put("acks", "all");
props.put("enable.idempotence", "true");
props.put("retries", 10);
props.put("key.serializer", StringSerializer.class);
props.put("value.serializer", KafkaAvroSerializer.class);
KafkaProducer<String, OrderEvent> producer = new KafkaProducer<>(props);
Consumers and Consumer Groups
Consumer Groups
A consumer group is a set of consumers that cooperatively consume a topic. Each partition is consumed by exactly one consumer in the group.
graph TD
subgraph Topic: orders - 4 partitions
P0[Partition 0]
P1[Partition 1]
P2[Partition 2]
P3[Partition 3]
end
subgraph Consumer Group A - 2 consumers
C1[Consumer 1]
C2[Consumer 2]
end
P0 --> C1
P1 --> C1
P2 --> C2
P3 --> C2
Consumer Configuration
| Config | Purpose | Recommendation |
|---|---|---|
group.id |
Consumer group identifier | Unique per logical consumer |
auto.offset.reset |
Where to start on new group | earliest or latest |
enable.auto.commit |
Auto-commit offsets | false for reliability |
max.poll.records |
Records per poll | Tune for processing time |
session.timeout.ms |
Heartbeat timeout | 10–30 seconds |
Offset Management
sequenceDiagram
participant C as Consumer
participant K as Kafka
participant DB as Application DB
C->>K: Poll messages
K->>C: Messages (offset 100-109)
C->>DB: Process and save
C->>K: Commit offset 110
Note over C,K: If crash before commit, reprocess from 100
Manual commit for reliability:
while (true) {
ConsumerRecords<String, OrderEvent> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, OrderEvent> record : records) {
processEvent(record.value()); // Process
}
consumer.commitSync(); // Commit after processing
}
Delivery Guarantees
| Guarantee | Description | Configuration |
|---|---|---|
| At most once | May lose; no duplicates | Auto-commit; acks=0 |
| At least once | May duplicate; no loss | Manual commit; acks=all; idempotent consumer |
| Exactly once | No loss; no duplicates | Transactions + idempotent processing |
Achieving Exactly-Once
// Producer side: enable transactions
props.put("enable.idempotence", "true");
props.put("transactional.id", "order-processor");
// Consumer side: read committed only
props.put("isolation.level", "read_committed");
Exactly-once is expensive
True exactly-once requires transactions and adds latency. Often, at-least-once with idempotent consumers is simpler and sufficient.
Idempotent Consumer Pattern
public void processEvent(OrderEvent event) {
// Check if already processed
if (processedEventRepository.exists(event.getEventId())) {
return; // Skip duplicate
}
// Process
orderService.processOrder(event);
// Mark as processed
processedEventRepository.save(event.getEventId());
}
Replication and Fault Tolerance
ISR — In-Sync Replicas
| Concept | Description |
|---|---|
| Leader | Handles reads/writes for partition |
| Follower | Replicates from leader |
| ISR | Replicas caught up to leader |
| min.insync.replicas | Minimum ISR for acks=all |
graph LR
subgraph Partition 0
L[Leader - Broker 1]
F1[Follower - Broker 2]
F2[Follower - Broker 3]
end
L --> F1
L --> F2
Recommended Configuration
With this config: 3 replicas, writes succeed if at least 2 (including leader) acknowledge. Survives 1 broker failure.
Schema Registry
Centralized schema management for Avro, Protobuf, or JSON Schema.
graph LR
P[Producer] -->|Register schema| SR[Schema Registry]
P -->|Send with schema ID| K[Kafka]
K --> C[Consumer]
C -->|Fetch schema| SR
Benefits
| Benefit | Description |
|---|---|
| Schema validation | Reject messages that don't conform |
| Compatibility checks | Prevent breaking changes |
| Schema evolution | Track schema versions |
| Efficient serialization | Schema ID in message, not full schema |
Compatibility Modes
| Mode | Can Add Fields | Can Remove Fields |
|---|---|---|
| BACKWARD | With defaults | Optional only |
| FORWARD | Required only | Any |
| FULL | With defaults | Optional only |
| NONE | Any | Any |
Dead Letter Queue (DLQ)
Handle messages that fail processing after max retries.
graph LR
T[orders topic] --> C[Consumer]
C -->|Process| App[Application]
C -->|Retry exhausted| DLQ[orders.dlq topic]
DLQ --> Alert[Alerting]
DLQ --> Reprocess[Manual reprocess]
public void processWithRetry(ConsumerRecord<String, OrderEvent> record) {
int retries = 0;
while (retries < MAX_RETRIES) {
try {
process(record.value());
return;
} catch (RetryableException e) {
retries++;
backoff(retries);
}
}
// Send to DLQ
dlqProducer.send(new ProducerRecord<>("orders.dlq", record.key(), record.value()));
}
Kafka vs Alternatives
| Aspect | Kafka | RabbitMQ | AWS SQS |
|---|---|---|---|
| Model | Log-based | Queue-based | Queue-based |
| Ordering | Per partition | Per queue (with caveats) | FIFO queues only |
| Retention | Time/size-based | Until consumed | 14 days max |
| Replay | Yes | No | No |
| Throughput | Very high | High | Moderate |
| Managed | Confluent, MSK | CloudAMQP | Native AWS |
Kafka Best Practices
| Area | Recommendation |
|---|---|
| Partitions | Start with #partitions = expected consumers |
| Keys | Use entity ID for ordering; null for load distribution |
| Serialization | Avro + Schema Registry for contracts |
| Retention | Set based on replay needs |
| Monitoring | Consumer lag, under-replicated partitions |
| Idempotency | Design idempotent consumers |
How many partitions should a topic have?
Start with number of partitions = expected maximum consumers. More partitions = more parallelism but more overhead. You can increase partitions later, but messages with the same key may go to different partitions after. Typical: 6–12 for moderate load; 50+ for high-throughput.
What happens when a Kafka consumer crashes?
The consumer group rebalances. Partitions assigned to the crashed consumer are redistributed to remaining consumers. Uncommitted offsets are replayed (at-least-once). If using auto-commit, some messages may be lost (at-most-once). Design consumers to be idempotent.
How does Kafka achieve exactly-once semantics?
(1) Idempotent producer: Each message has sequence number; broker deduplicates. (2) Transactions: Atomic writes across partitions and consumer offset commits. (3) Read committed: Consumers only see committed transaction messages. (4) Idempotent consumer: Application-level deduplication for full exactly-once.