Core Concepts

An MCP server is easier to reason about when you treat it as a controlled execution pipeline, not just a chat endpoint.

The model suggests actions, but your orchestrator decides how those actions are executed safely.

https://modelcontextprotocol.io/docs/learn/architecture#transport-layer

MCP request lifecycle

flowchart LR
    U[User Prompt] --> API[FastAPI /chat]
    API --> E[handle_prompt]
    E --> R[route]
    R --> T[Tool Execution]
    T --> M[MemoryStore.save]
    M --> O[trace events]
    O --> RESP[Response]

    style API fill:#1976d2,color:#fff
    style E fill:#1976d2,color:#fff
    style R fill:#1976d2,color:#fff
    style T fill:#ff9800,color:#fff
    style M fill:#ff9800,color:#fff
    style O fill:#ff9800,color:#fff

Components in this repository

Component	File	Responsibility
API boundary	`src/api/server.py`	Receives prompt payload and returns model or tool output
Orchestrator	`src/core/engine.py`	Coordinates route, execute, memory, and trace
Router	`src/llm/router.py`	Chooses tool and arguments from prompt intent
Tool	`src/tools/calculator.py`	Performs deterministic business logic
Memory	`src/memory/store.py`	Persists prompt and response pairs
Observability	`src/observability/tracer.py`	Emits trace events for debugging and analysis

Deterministic tool confidence score

A simple way to reason about whether a route should execute is to score how confident the system is in the tool choice.

\[ C = \alpha I + \beta S + \gamma H \]

Symbol	Meaning
`C`	Overall confidence score for selecting a tool
`I`	Intent match quality between prompt and tool purpose
`S`	Schema compatibility score of extracted arguments
`H`	Historical success score for this tool in similar prompts
`alpha, beta, gamma`	Weights that sum to 1 and reflect your routing priorities

Worked example: if I=0.9, S=0.8, H=0.7, and weights are 0.5, 0.3, 0.2, then C=0.82, which is usually strong enough to execute automatically.

Common failure modes and fixes

Failure mode	Signal	Mitigation
Tool chosen but wrong args	Validation errors in tool call	Add stronger schema extraction and defaults
No tool selected	Frequent "No suitable tool found" responses	Expand intent patterns and fallback behavior
Memory noise	Context quality degrades over time	Add retention windows and relevance filters

Why not call tools directly from the API layer?

Keeping orchestration in handle_prompt centralizes governance. It becomes easier to enforce validation, tracing, and memory policy in one place.

What is the minimum production-ready component set?

You need API boundary, orchestrator, router, tool contracts, memory policy, and observability. Skipping observability creates blind spots during incidents.

How does this relate to classic microservices?

Tools can map to microservice calls, but MCP adds model-mediated intent and schema-driven invocation. You still need the same reliability and security standards as service-to-service systems.

--8<-- "_abbreviations.md"