Comprehensive Error Handling¶
Error Types & Strategies¶
1. Timeout Errors¶
Cause: AI takes >threshold time
Java
@Service
public class TimeoutHandler {
public String handleTimeout(String prompt) {
try {
return aiClient.generateResponse(prompt)
.orTimeout(200, TimeUnit.MILLISECONDS) // User-facing
.get();
} catch (TimeoutException e) {
log.warn("AI timeout, using fallback", e);
metrics.counter("ai.timeout").increment();
// Fallback 1: Cached response
String cached = cache.get(hashPrompt(prompt));
if (cached != null) return cached;
// Fallback 2: Traditional ranking
return traditionalResponse(prompt);
}
}
}
2. Network Errors¶
Cause: Network unavailable, DNS failures
Java
@Service
public class NetworkErrorHandler {
private static final int MAX_RETRIES = 3;
public String handleNetworkError(String prompt) {
for (int attempt = 1; attempt <= MAX_RETRIES; attempt++) {
try {
return aiClient.generateResponse(prompt);
} catch (IOException e) {
if (attempt < MAX_RETRIES) {
// Exponential backoff: 100ms, 200ms, 400ms
long backoff = (long) (100 * Math.pow(2, attempt - 1));
Thread.sleep(backoff);
} else {
log.error("All retries failed", e);
metrics.counter("ai.network_error").increment();
throw new ServiceUnavailableException("AI service down", e);
}
}
}
}
}
3. Invalid Responses¶
Cause: AI returns malformed/unparseable response
Java
@Service
public class ResponseValidator {
public List<Product> parseAiResponse(String response) {
try {
// Validate JSON structure
JsonNode root = objectMapper.readTree(response);
if (!root.isArray()) {
throw new InvalidResponseException(
"Response must be array, got: " +
root.getNodeType()
);
}
// Validate each item
List<Product> products = new ArrayList<>();
for (JsonNode item : root) {
if (!item.has("productId") || !item.has("rank")) {
throw new InvalidResponseException(
"Missing required fields"
);
}
products.add(parseProduct(item));
}
return products;
} catch (IOException e) {
log.error("Failed to parse AI response", e);
metrics.counter("ai.parse_error").increment();
throw new InvalidResponseException("Unparseable response", e);
}
}
}
4. Rate Limiting / Cost Control¶
Cause: Too many API calls, budget exceeded
Java
@Service
public class CostController {
private final AtomicDouble dailySpend = new AtomicDouble(0);
private final double dailyBudget = 100;
public String handleBudgetExceeded(String prompt) {
double requestCost = estimateCost(prompt);
if (dailySpend.get() + requestCost > dailyBudget) {
log.warn("Daily budget exceeded, using fallback");
metrics.counter("budget.exceeded").increment();
// Fallback: Traditional response without AI
return traditionalResponse(prompt);
}
String result = aiClient.generateResponse(prompt);
dailySpend.addAndGet(requestCost);
return result;
}
}
5. Provider Failure (Multi-Provider Fallback)¶
Cause: Selected provider unavailable
Java
@Service
public class MultiProviderAiClient implements AiClient {
private final List<AiClient> providers = Arrays.asList(
new OpenAiClient(),
new AnthropicClient(),
new OllamaClient()
);
@Override
public String generateResponse(String prompt) {
Exception lastError = null;
for (AiClient provider : providers) {
try {
return provider.generateResponse(prompt);
} catch (Exception e) {
lastError = e;
log.warn("Provider failed, trying next", e);
metrics.counter("provider.failover")
.tag("provider", provider.getModelName())
.increment();
}
}
// All providers failed
log.error("All AI providers unavailable", lastError);
throw new AllProvidersFailedException("No AI available", lastError);
}
}
Error Recovery Strategies¶
| Error Type | Strategy | Risk | Impact |
|---|---|---|---|
| Timeout | Fallback | Low | Slight quality decrease |
| Network | Retry + exponential backoff | Low | Latency increase |
| Invalid response | Use cached response | Medium | Stale data |
| Rate limiting | Use traditional method | Medium | Loss of enhancement |
| All providers down | Return default results | High | Feature disabled |
Monitoring & Alerting¶
YAML
alerts:
- name: HighErrorRate
condition: error_rate > 0.05
duration: 5m
action: page_oncall
- name: AllProvidersFailed
condition: ai_available == false
duration: 30s
action: page_oncall_critical
- name: HighTimeoutRate
condition: timeout_rate > 0.10
duration: 10m
action: notify_team
- name: InvalidResponses
condition: parse_error_rate > 0.02
duration: 5m
action: notify_team
Testing Error Scenarios¶
Java
@SpringBootTest
class AiErrorHandlingTest {
@MockBean
private AiClient aiClient;
@Test
void testTimeoutFallback() {
Mockito.doThrow(new TimeoutException())
.when(aiClient).generateResponse(anyString());
String result = service.search(request);
assertNotNull(result);
assertTrue(result.isFallback());
}
@Test
void testInvalidResponseHandling() {
Mockito.when(aiClient.generateResponse(anyString()))
.thenReturn("{invalid json}");
assertThrows(InvalidResponseException.class,
() -> service.search(request));
}
@Test
void testMultiProviderFailover() {
when(openaiClient.generateResponse(anyString()))
.thenThrow(new TimeoutException());
when(claudeClient.generateResponse(anyString()))
.thenThrow(new TimeoutException());
when(ollamaClient.generateResponse(anyString()))
.thenReturn("success");
String result = multiProviderClient.generateResponse(prompt);
assertEquals("success", result);
}
}
Key Principle: Never let AI failures crash the service. Always have a fallback.