Core Concepts
An MCP server is easier to reason about when you treat it as a controlled execution pipeline, not just a chat endpoint.
The model suggests actions, but your orchestrator decides how those actions are executed safely.
https://modelcontextprotocol.io/docs/learn/architecture#transport-layer
MCP request lifecycle
flowchart LR
U[User Prompt] --> API[FastAPI /chat]
API --> E[handle_prompt]
E --> R[route]
R --> T[Tool Execution]
T --> M[MemoryStore.save]
M --> O[trace events]
O --> RESP[Response]
style API fill:#1976d2,color:#fff
style E fill:#1976d2,color:#fff
style R fill:#1976d2,color:#fff
style T fill:#ff9800,color:#fff
style M fill:#ff9800,color:#fff
style O fill:#ff9800,color:#fff
Components in this repository
| Component | File | Responsibility |
|---|---|---|
| API boundary | src/api/server.py |
Receives prompt payload and returns model or tool output |
| Orchestrator | src/core/engine.py |
Coordinates route, execute, memory, and trace |
| Router | src/llm/router.py |
Chooses tool and arguments from prompt intent |
| Tool | src/tools/calculator.py |
Performs deterministic business logic |
| Memory | src/memory/store.py |
Persists prompt and response pairs |
| Observability | src/observability/tracer.py |
Emits trace events for debugging and analysis |
Deterministic tool confidence score
A simple way to reason about whether a route should execute is to score how confident the system is in the tool choice.
| Symbol | Meaning |
|---|---|
C |
Overall confidence score for selecting a tool |
I |
Intent match quality between prompt and tool purpose |
S |
Schema compatibility score of extracted arguments |
H |
Historical success score for this tool in similar prompts |
alpha, beta, gamma |
Weights that sum to 1 and reflect your routing priorities |
Worked example: if I=0.9, S=0.8, H=0.7, and weights are 0.5, 0.3, 0.2, then C=0.82, which is usually strong enough to execute automatically.
Common failure modes and fixes
| Failure mode | Signal | Mitigation |
|---|---|---|
| Tool chosen but wrong args | Validation errors in tool call | Add stronger schema extraction and defaults |
| No tool selected | Frequent "No suitable tool found" responses | Expand intent patterns and fallback behavior |
| Memory noise | Context quality degrades over time | Add retention windows and relevance filters |
Why not call tools directly from the API layer?
Keeping orchestration in handle_prompt centralizes governance.
It becomes easier to enforce validation, tracing, and memory policy in one place.
What is the minimum production-ready component set?
You need API boundary, orchestrator, router, tool contracts, memory policy, and observability. Skipping observability creates blind spots during incidents.
How does this relate to classic microservices?
Tools can map to microservice calls, but MCP adds model-mediated intent and schema-driven invocation. You still need the same reliability and security standards as service-to-service systems.
--8<-- "_abbreviations.md"