Bulkhead Pattern — Deep Dive

Level: Intermediate
Pre-reading: 06 · Resilience & Reliability

What is Bulkhead?

Bulkhead isolates resources so a failure in one area doesn't exhaust resources for the entire system. Named after watertight compartments in a ship — if one floods, the others stay dry.

graph TD
    subgraph Without Bulkhead
        A1[All requests] --> P1[Shared Thread Pool]
        P1 --> S1[Service A - Slow]
        P1 --> S2[Service B - Fast]
        P1 --> S3[Service C - Fast]
    end

graph TD
    subgraph With Bulkhead
        A2[Requests] --> PA[Pool A - 10 threads]
        A2 --> PB[Pool B - 10 threads]
        A2 --> PC[Pool C - 10 threads]
        PA --> SA[Service A - Slow]
        PB --> SB[Service B - Fast]
        PC --> SC[Service C - Fast]
    end

Bulkhead Types

Thread Pool Bulkhead

Each service call has a dedicated thread pool.

Service	Thread Pool	Max Concurrent
Payment Service	payment-pool	20 threads
Inventory Service	inventory-pool	10 threads
Notification Service	notification-pool	5 threads

Benefit: Slow service only exhausts its pool.

Semaphore Bulkhead

Limit concurrent calls without dedicated threads. Lighter weight.

Semaphore semaphore = new Semaphore(10);  // Max 10 concurrent

public Result call() {
    if (semaphore.tryAcquire(100, TimeUnit.MILLISECONDS)) {
        try {
            return doCall();
        } finally {
            semaphore.release();
        }
    }
    throw new BulkheadFullException();
}

Comparison

Aspect	Thread Pool	Semaphore
Isolation	Full (separate threads)	Partial (shared threads)
Overhead	Higher (context switching)	Lower
Timeout	Per-thread timeout	Relies on call timeout
Use case	Blocking I/O	Non-blocking / reactive

Resilience4j Bulkhead

Thread Pool Bulkhead

ThreadPoolBulkheadConfig config = ThreadPoolBulkheadConfig.custom()
    .maxThreadPoolSize(10)
    .coreThreadPoolSize(5)
    .queueCapacity(20)
    .keepAliveDuration(Duration.ofMillis(100))
    .build();

ThreadPoolBulkhead bulkhead = ThreadPoolBulkhead.of("paymentBulkhead", config);

CompletionStage<PaymentResult> result = bulkhead.executeSupplier(
    () -> paymentClient.charge(order)
);

Semaphore Bulkhead

BulkheadConfig config = BulkheadConfig.custom()
    .maxConcurrentCalls(10)
    .maxWaitDuration(Duration.ofMillis(500))
    .build();

Bulkhead bulkhead = Bulkhead.of("paymentBulkhead", config);

Supplier<PaymentResult> decorated = Bulkhead.decorateSupplier(
    bulkhead, () -> paymentClient.charge(order)
);

Annotation-Based

@Service
public class PaymentService {

    @Bulkhead(name = "paymentBulkhead", type = Bulkhead.Type.SEMAPHORE)
    public PaymentResult processPayment(Order order) {
        return paymentClient.charge(order);
    }
}

Bulkhead Configuration

resilience4j:
  bulkhead:
    instances:
      paymentBulkhead:
        maxConcurrentCalls: 10
        maxWaitDuration: 500ms
      inventoryBulkhead:
        maxConcurrentCalls: 20
        maxWaitDuration: 0  # Fail immediately
  thread-pool-bulkhead:
    instances:
      paymentBulkhead:
        maxThreadPoolSize: 10
        coreThreadPoolSize: 5
        queueCapacity: 20

Connection Pool Bulkhead

Separate database connection pools for different operations.

@Configuration
public class DataSourceConfig {

    @Bean("ordersDataSource")
    public DataSource ordersDataSource() {
        HikariConfig config = new HikariConfig();
        config.setMaximumPoolSize(20);
        config.setPoolName("orders-pool");
        return new HikariDataSource(config);
    }

    @Bean("analyticsDataSource")
    public DataSource analyticsDataSource() {
        HikariConfig config = new HikariConfig();
        config.setMaximumPoolSize(5);  // Smaller pool for analytics
        config.setPoolName("analytics-pool");
        return new HikariDataSource(config);
    }
}

Benefit: Slow analytics queries don't exhaust connections for orders.

Process Bulkhead

Separate critical and non-critical workloads at the deployment level.

graph TD
    subgraph Critical Path
        O[Order Service - 10 replicas]
        P[Payment Service - 10 replicas]
    end
    subgraph Non-Critical
        R[Recommendations - 3 replicas]
        A[Analytics - 2 replicas]
    end

Deploy critical services with more resources
Use separate node pools in Kubernetes
Different scaling policies

Kubernetes Resource Isolation

apiVersion: v1
kind: Pod
metadata:
  name: order-service
spec:
  containers:
    - name: order
      resources:
        requests:
          memory: "512Mi"
          cpu: "500m"
        limits:
          memory: "1Gi"
          cpu: "1000m"
  nodeSelector:
    workload-type: critical  # Dedicated nodes

Bulkhead with Other Patterns

Bulkhead + Circuit Breaker

@CircuitBreaker(name = "paymentService")
@Bulkhead(name = "paymentBulkhead")
@Retry(name = "paymentService")
public PaymentResult processPayment(Order order) {
    return paymentClient.charge(order);
}

Execution order: Bulkhead → Circuit Breaker → Retry → Call

Bulkhead + Rate Limiter

@Bulkhead(name = "searchBulkhead")
@RateLimiter(name = "searchRateLimiter")
public SearchResults search(String query) {
    return searchClient.search(query);
}

Monitoring

Metrics

# Semaphore bulkhead
resilience4j_bulkhead_available_concurrent_calls{name="paymentBulkhead"} 8
resilience4j_bulkhead_max_allowed_concurrent_calls{name="paymentBulkhead"} 10

# Thread pool bulkhead
resilience4j_thread_pool_bulkhead_queue_depth{name="paymentBulkhead"} 5
resilience4j_thread_pool_bulkhead_current_thread_count{name="paymentBulkhead"} 10
resilience4j_thread_pool_bulkhead_available_queue_capacity{name="paymentBulkhead"} 15

Alerting

Metric	Alert Condition
Available concurrent calls	< 20% of max sustained
Queue depth	> 80% capacity
Rejected calls	> 0 sustained

Sizing Bulkheads

Little's Law

Concurrent requests = Arrival rate × Average latency

Example:

100 requests/second
200ms average latency
Needed: 100 × 0.2 = 20 concurrent threads minimum

Add headroom for variance:

int concurrentCalls = (int) (requestsPerSecond * avgLatencySeconds * 1.5);

Start Conservative

Begin with low limits
Monitor queue depth and rejections
Increase gradually

Common Mistakes

Mistake	Impact	Fix
Bulkhead too small	Many rejections	Size based on Little's Law
Bulkhead too large	No isolation	Smaller pools per service
Single shared pool	No isolation	Separate pools
No timeout	Threads blocked forever	Always set timeouts
Not monitoring	Don't know when exhausted	Add metrics and alerts

When should you use thread pool vs semaphore bulkhead?

Use thread pool for blocking I/O operations — it provides full isolation with separate threads. Use semaphore for non-blocking/reactive operations — it's lighter weight and limits concurrency without thread overhead. Most Spring MVC apps use thread pool; WebFlux apps use semaphore.

How do you size a bulkhead?

Use Little's Law: Concurrent = Rate × Latency. For 100 req/s and 200ms latency, you need ~20 concurrent slots. Add 50% headroom = 30. Monitor in production and adjust: if queue is always full, increase; if always empty, decrease.

How does bulkhead differ from rate limiting?

Bulkhead limits concurrent calls — how many requests are in-flight simultaneously. Rate limiter limits throughput — how many requests per time window. Bulkhead prevents resource exhaustion; rate limiter prevents overload. Use both: bulkhead for concurrency, rate limiter for throughput.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search