Timeout Pattern — Deep Dive

Level: Intermediate
Pre-reading: 06 · Resilience & Reliability

Why Timeouts Matter

Without timeouts, a slow or unresponsive service can:

Block threads indefinitely
Exhaust connection pools
Cascade failures upstream
Leave users waiting forever

Rule: Every remote call needs a timeout.

Types of Timeouts

Type	Description	Typical Value
Connect timeout	Time to establish TCP connection	1-5 seconds
Read timeout	Time to receive response after request sent	5-30 seconds
Write timeout	Time to send request body	5-10 seconds
Request timeout	Total time for entire operation	10-60 seconds

sequenceDiagram
    participant C as Client
    participant S as Server
    Note over C,S: Connect timeout window
    C->>S: TCP SYN
    S->>C: TCP SYN-ACK
    C->>S: TCP ACK
    Note over C,S: Connection established
    Note over C,S: Write timeout window
    C->>S: HTTP Request
    Note over C,S: Read timeout window
    S->>C: HTTP Response

Configuring Timeouts

RestTemplate

@Bean
public RestTemplate restTemplate() {
    HttpComponentsClientHttpRequestFactory factory = 
        new HttpComponentsClientHttpRequestFactory();
    factory.setConnectTimeout(5000);     // 5 seconds
    factory.setReadTimeout(10000);       // 10 seconds
    return new RestTemplate(factory);
}

WebClient

@Bean
public WebClient webClient() {
    HttpClient httpClient = HttpClient.create()
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
        .responseTimeout(Duration.ofSeconds(10));

    return WebClient.builder()
        .clientConnector(new ReactorClientHttpConnector(httpClient))
        .build();
}

Feign

feign:
  client:
    config:
      default:
        connectTimeout: 5000
        readTimeout: 10000
      payment-service:
        readTimeout: 30000  # Longer for payment

HTTP Client (Apache)

RequestConfig config = RequestConfig.custom()
    .setConnectTimeout(5000)
    .setSocketTimeout(10000)
    .setConnectionRequestTimeout(3000)  // Time to get connection from pool
    .build();

CloseableHttpClient client = HttpClients.custom()
    .setDefaultRequestConfig(config)
    .build();

Deadline Propagation

Pass remaining time budget through service call chains. If the client has 10s total and the first hop takes 3s, only 7s remain for the rest.

sequenceDiagram
    participant C as Client
    participant A as Service A
    participant B as Service B
    participant D as Service C

    Note over C: Deadline: T+10s
    C->>A: Request (X-Request-Deadline: T+10s)
    Note over A: Remaining: 10s
    A->>A: Process (2s)
    Note over A: Remaining: 8s
    A->>B: Request (X-Request-Deadline: T+10s)
    Note over B: Remaining: 8s
    B->>B: Process (3s)
    Note over B: Remaining: 5s
    B->>D: Request (X-Request-Deadline: T+10s)
    Note over D: Remaining: 5s

Implementation

@Component
public class DeadlineInterceptor implements ClientHttpRequestInterceptor {

    @Override
    public ClientHttpResponse intercept(HttpRequest request, byte[] body,
            ClientHttpRequestExecution execution) throws IOException {

        // Get deadline from context
        Instant deadline = DeadlineContext.getDeadline();
        if (deadline != null) {
            request.getHeaders().set("X-Request-Deadline", deadline.toString());
        }

        return execution.execute(request, body);
    }
}

// Check deadline before expensive operations
public Order processOrder(OrderRequest request) {
    if (DeadlineContext.isExpired()) {
        throw new DeadlineExceededException();
    }

    // Continue processing...
}

gRPC Deadlines

gRPC has built-in deadline support:

// Client sets deadline
OrderServiceGrpc.OrderServiceBlockingStub stub = OrderServiceGrpc
    .newBlockingStub(channel)
    .withDeadlineAfter(10, TimeUnit.SECONDS);

// Server checks remaining time
long remaining = Context.current().getDeadline().timeRemaining(TimeUnit.MILLISECONDS);
if (remaining <= 0) {
    throw Status.DEADLINE_EXCEEDED.asRuntimeException();
}

Timeout vs Circuit Breaker

Aspect	Timeout	Circuit Breaker
Purpose	Limit wait time for single call	Prevent calls to failing service
Scope	Per request	Across requests
When triggers	Call takes too long	Failure rate exceeds threshold
Result	Request fails	Request rejected immediately

Use together: Timeout causes individual calls to fail; circuit breaker counts timeouts as failures.

Timeout Strategy by Service

Service Type	Timeout	Rationale
Database	5-10s	Queries should be fast
Cache (Redis)	100-500ms	Cache should be very fast
Internal service	5-15s	Within cluster, should be fast
External API	15-30s	Third-party may be slow
Payment gateway	30-60s	May involve bank processing
File upload	60-120s	Large files take time

Handling Timeout Errors

try {
    return restTemplate.getForObject(url, Order.class);
} catch (ResourceAccessException e) {
    if (e.getCause() instanceof SocketTimeoutException) {
        // Read timeout
        log.warn("Timeout calling order service: {}", url);
        throw new ServiceTimeoutException("Order service timeout", e);
    }
    if (e.getCause() instanceof ConnectTimeoutException) {
        // Connect timeout
        log.error("Cannot connect to order service: {}", url);
        throw new ServiceUnavailableException("Order service unreachable", e);
    }
    throw e;
}

Resilience4j TimeLimiter

TimeLimiterConfig config = TimeLimiterConfig.custom()
    .timeoutDuration(Duration.ofSeconds(10))
    .cancelRunningFuture(true)
    .build();

TimeLimiter timeLimiter = TimeLimiter.of("orderService", config);

CompletableFuture<Order> future = CompletableFuture.supplyAsync(
    () -> orderClient.getOrder(orderId)
);

Order order = timeLimiter.executeFutureSupplier(() -> future);

With Annotation

@TimeLimiter(name = "orderService", fallbackMethod = "orderFallback")
public CompletableFuture<Order> getOrder(String orderId) {
    return CompletableFuture.supplyAsync(() -> 
        orderClient.getOrder(orderId)
    );
}

Common Mistakes

Mistake	Impact	Fix
No timeout configured	Threads blocked forever	Always set timeouts
Timeout too long	Poor user experience	Match SLA requirements
Timeout too short	False failures	Allow for normal variance
Same timeout everywhere	Suboptimal behavior	Tune per service
Ignoring deadline propagation	Chain exceeds total budget	Pass remaining time

Timeout Anti-Pattern: The Timeout Cliff

When multiple services in a chain have the same timeout:

Service A (timeout: 10s) → Service B (timeout: 10s) → Service C (timeout: 10s)

If Service C takes 9s, Service B might timeout at 10s before getting the response, even though C was "successful."

Fix: Decrease timeouts down the chain:

Service A (timeout: 10s) → Service B (timeout: 7s) → Service C (timeout: 4s)

How do you choose the right timeout value?

Consider: (1) p99 latency of the downstream service — set timeout 2-3x higher. (2) User experience — how long can users wait? (3) SLA requirements — what's the total latency budget? (4) Service type — cache should be fast (500ms), payment can be slow (30s). Start conservative; tune based on production data.

What is deadline propagation and why is it important?

Deadline propagation passes the remaining time budget through a call chain. Without it, each service has its own timeout, and the total time can far exceed what the client expects. With it, each service knows how much time remains and can fail early if the deadline is already exceeded.

What happens to the request on the server when the client times out?

The server doesn't know the client gave up — it may continue processing. This can waste resources or cause issues if the operation has side effects. Mitigations: (1) Use deadline propagation — server checks and aborts. (2) Design for idempotency. (3) Use request cancellation (gRPC supports this).

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search