Skip to content

Core Concepts

An MCP server is easier to reason about when you treat it as a controlled execution pipeline, not just a chat endpoint.

The model suggests actions, but your orchestrator decides how those actions are executed safely.

https://modelcontextprotocol.io/docs/learn/architecture#transport-layer

MCP request lifecycle

flowchart LR
    U[User Prompt] --> API[FastAPI /chat]
    API --> E[handle_prompt]
    E --> R[route]
    R --> T[Tool Execution]
    T --> M[MemoryStore.save]
    M --> O[trace events]
    O --> RESP[Response]

    style API fill:#1976d2,color:#fff
    style E fill:#1976d2,color:#fff
    style R fill:#1976d2,color:#fff
    style T fill:#ff9800,color:#fff
    style M fill:#ff9800,color:#fff
    style O fill:#ff9800,color:#fff

Components in this repository

Component File Responsibility
API boundary src/api/server.py Receives prompt payload and returns model or tool output
Orchestrator src/core/engine.py Coordinates route, execute, memory, and trace
Router src/llm/router.py Chooses tool and arguments from prompt intent
Tool src/tools/calculator.py Performs deterministic business logic
Memory src/memory/store.py Persists prompt and response pairs
Observability src/observability/tracer.py Emits trace events for debugging and analysis

Deterministic tool confidence score

A simple way to reason about whether a route should execute is to score how confident the system is in the tool choice.

\[ C = \alpha I + \beta S + \gamma H \]
Symbol Meaning
C Overall confidence score for selecting a tool
I Intent match quality between prompt and tool purpose
S Schema compatibility score of extracted arguments
H Historical success score for this tool in similar prompts
alpha, beta, gamma Weights that sum to 1 and reflect your routing priorities

Worked example: if I=0.9, S=0.8, H=0.7, and weights are 0.5, 0.3, 0.2, then C=0.82, which is usually strong enough to execute automatically.

Common failure modes and fixes

Failure mode Signal Mitigation
Tool chosen but wrong args Validation errors in tool call Add stronger schema extraction and defaults
No tool selected Frequent "No suitable tool found" responses Expand intent patterns and fallback behavior
Memory noise Context quality degrades over time Add retention windows and relevance filters
Why not call tools directly from the API layer?

Keeping orchestration in handle_prompt centralizes governance. It becomes easier to enforce validation, tracing, and memory policy in one place.

What is the minimum production-ready component set?

You need API boundary, orchestrator, router, tool contracts, memory policy, and observability. Skipping observability creates blind spots during incidents.

How does this relate to classic microservices?

Tools can map to microservice calls, but MCP adds model-mediated intent and schema-driven invocation. You still need the same reliability and security standards as service-to-service systems.

--8<-- "_abbreviations.md"