Service Layer AI Pattern¶
The Pattern: Service Layer Orchestration¶
The AI-enabled service layer acts as an orchestrator that:
- Handles business logic
- Fetches required data
- Builds context for LLM
- Calls AI when appropriate
- Validates and processes output
- Returns enhanced results
Anatomy of an AI Service¶
Step-by-Step Flow¶
Java
@Service
public class ProductSearchService {
private final ProductRepository productRepository;
private final AiClient aiClient;
public ProductSearchResponse searchProducts(ProductSearchRequest request) {
// Step 1: Fetch data from database
List<Product> results = productRepository.searchProducts(request.getQuery());
// Step 2: Build context (business logic + data)
String context = buildSearchContext(request, results);
// Step 3: Call LLM with context
String aiResponse = aiClient.generateResponseWithContext(systemPrompt, context);
// Step 4: Validate and process LLM output
String enhancedSuggestion = validateAndClean(aiResponse);
// Step 5: Build response to client
return buildResponse(results, enhancedSuggestion);
}
}
Responsibilities¶
| Responsibility | Example | Why Service Layer |
|---|---|---|
| Fetch Data | Query ProductRepository | Service has repository access |
| Build Context | Add user preferences, product details | Service understands domain |
| Call AI | AiClient.generateResponse() | Service controls when to call |
| Validate Output | Ensure response matches schema | Service knows business rules |
| Enhance Results | Add scores, filter, rank | Service can enhance original data |
Key Concept: Separation of Concerns¶
❌ Anti-Pattern: AI in Controller¶
Java
@RestController
@RequestMapping("/products/search")
public class ProductSearchController {
private final AiClient aiClient;
@GetMapping
public ResponseEntity<Response> search(@RequestParam String query) {
// ❌ Bad: Controller is handling business logic
List<Product> results = repository.search(query);
String context = buildComplexContext(results);
String aiResponse = aiClient.generateResponse(context);
// Controller now knows about:
// - Prompt engineering
// - LLM error handling
// - Response parsing
// This violates SRP!
}
}
Problems: - Controller has too many responsibilities - Hard to test - Couples HTTP handling with AI logic - Can't reuse logic if called from other endpoints
✅ Pattern: AI in Service¶
Java
@Service
public class ProductSearchService {
private final ProductRepository productRepository;
private final AiClient aiClient;
public ProductSearchResponse searchProducts(ProductSearchRequest request) {
// All AI orchestration here
return enhanceWithAI(request);
}
private ProductSearchResponse enhanceWithAI(ProductSearchRequest request) {
// Complex context building
// Error handling
// Prompt engineering
// Response parsing
// All encapsulated in service
}
}
@RestController
public class ProductSearchController {
private final ProductSearchService service;
@GetMapping("/products/search")
public ResponseEntity<ProductSearchResponse> search(ProductSearchRequest req) {
// ✅ Good: Controller delegates to service
return ResponseEntity.ok(service.searchProducts(req));
}
}
Benefits: - Single Responsibility Principle - Easy to test (mock service or AI client) - Reusable from other controllers/APIs - Clean separation
Context Building Strategy¶
The quality of AI output depends on the quality of context provided.
Poor Context: Vague¶
Java
String prompt = "User: " + request.getQuery();
// LLM doesn't know:
// - What products are available?
// - What is the user's history?
// - What should we recommend?
Good Context: Rich and Structured¶
Java
String context = buildSearchContext(request, results);
// Returns something like:
/*
USER SEARCH QUERY: "laptop with good performance"
SEARCH FILTERS:
- Category: Electronics
- Max Price: $1500
AVAILABLE PRODUCTS (6 found):
1. Premium Laptop Pro - $1899
- Processor: Intel i9
- RAM: 32GB
- GPU: RTX 4080
- Storage: 1TB SSD
- Category: Electronics
2. Budget Laptop Basic - $599
- Processor: Intel i5
- RAM: 8GB
- GPU: Integrated
- Storage: 256GB SSD
- Category: Electronics
[... more products ...]
TASK: Analyze user query and available products.
Recommend the BEST 3 products for this user and explain why.
*/
Building Context in Code¶
Java
private String buildSearchContext(ProductSearchRequest request, List<Product> results) {
StringBuilder context = new StringBuilder();
// User's intent
context.append("USER SEARCH:\n");
context.append("- Query: ").append(request.getQuery()).append("\n");
context.append("- Category: ").append(request.getCategory()).append("\n");
context.append("- Max Budget: $").append(request.getMaxPrice()).append("\n");
context.append("\n");
// Available products with details
context.append("AVAILABLE PRODUCTS (").append(results.size()).append(" found):\n");
results.forEach((product, index) -> {
context.append(index + 1).append(". ").append(product.getName()).append("\n");
context.append(" Price: $").append(product.getPrice()).append("\n");
context.append(" Description: ").append(product.getDescription()).append("\n");
context.append(" Stock: ").append(product.getStockQuantity()).append("\n");
context.append("\n");
});
return context.toString();
}
Error Handling Strategy¶
Graceful Degradation¶
Java
try {
// Primary: Try with AI enhancement
if (aiClient.isAvailable()) {
String aiResponse = aiClient.generateResponse(context);
return buildEnhancedResponse(results, aiResponse);
}
} catch (TimeoutException e) {
logger.warn("AI request timed out, using fallback");
// Monitor: Log for observability
} catch (RateLimitException e) {
logger.warn("AI rate limit exceeded, using fallback");
// Monitor: Track rate limit issues
} catch (Exception e) {
logger.error("Unexpected AI error", e);
// Monitor: Alert on unexpected errors
}
// Fallback: Return traditional results
return buildTraditionalResponse(results);
Timeout Management¶
Java
try {
// Set a timeout for AI calls
String response = aiClient.generateResponseWithTimeout(
prompt,
Duration.ofSeconds(3) // Don't wait longer than 3s
);
} catch (TimeoutException e) {
// Don't let LLM delay user experience
return fallbackResponse;
}
Caching Patterns¶
Full Response Caching¶
Java
@Service
@CacheConfig(cacheNames = "productSearchCache")
public class ProductSearchService {
@Cacheable(key = "#request.query")
public ProductSearchResponse searchProducts(ProductSearchRequest request) {
// First call hits DB and LLM
// Second identical call uses cache
return computeResponse(request);
}
}
Partial Caching¶
Java
public ProductSearchResponse searchProducts(ProductSearchRequest request) {
List<Product> results = repository.search(request.getQuery()); // Always fresh
// Cache the expensive AI part
String aiSuggestion = cache.getOrCompute(
cacheKey(request),
() -> aiClient.generateResponse(buildContext(results))
);
return buildResponse(results, aiSuggestion);
}
Decision Points in Service¶
When to Call AI?¶
Java
public ProductSearchResponse searchProducts(ProductSearchRequest request) {
List<Product> results = repository.search(request.getQuery());
// Decision 1: User asked for AI enhancement
if (request.getUseAIEnhancement() == null || !request.getUseAIEnhancement()) {
return buildTraditionalResponse(results);
}
// Decision 2: Enough results to analyze
if (results.isEmpty()) {
return buildEmptyResponse();
}
// Decision 3: AI client is available
if (!aiClient.isAvailable()) {
return buildTraditionalResponse(results);
}
// Decision 4: Within rate limits
if (rateLimiter.isExceeded()) {
return buildTraditionalResponse(results);
}
// All checks pass: Call AI
return enhanceWithAI(results, request);
}
Testing Strategies¶
Unit Test with Mock AI¶
Java
@SpringBootTest
public class ProductSearchServiceTest {
@MockBean
private AiClient aiClient;
@Autowired
private ProductSearchService service;
@Test
void testSearchWithAIEnhancement() {
// Arrange
when(aiClient.generateResponse(anyString()))
.thenReturn("MockAI response: Recommend laptop X");
ProductSearchRequest request = new ProductSearchRequest(
"laptop", null, null, true
);
// Act
ProductSearchResponse response = service.searchProducts(request);
// Assert
assertTrue(response.isEnhancedWithAI());
assertNotNull(response.getAiSuggestion());
}
@Test
void testFallbackWhenAIUnavailable() {
// Arrange
when(aiClient.isAvailable()).thenReturn(false);
// Act
ProductSearchResponse response = service.searchProducts(request);
// Assert
assertFalse(response.isEnhancedWithAI());
}
}
Real-World Considerations¶
Thread Safety¶
If calling multiple LLM endpoints, consider async:
Java
public async CompletableFuture<ProductSearchResponse> searchAsync(
ProductSearchRequest request) {
// Fetch data in parallel
CompletableFuture<List<Product>> productsFuture =
CompletableFuture.supplyAsync(() ->
repository.search(request.getQuery())
);
CompletableFuture<List<Product>> relatedFuture =
CompletableFuture.supplyAsync(() ->
repository.findRelated(request.getCategory())
);
// Combine results
return CompletableFuture.allOf(productsFuture, relatedFuture)
.thenApplyAsync(v -> enhanceWithAI(
productsFuture.join(),
relatedFuture.join()
));
}
Resource Cleanup¶
Java
try (var aiContext = aiClient.createContext()) {
String response = aiContext.generate(prompt);
return process(response);
} catch (Exception e) {
// Resources automatically cleaned up
throw new AIException(e);
}