03 · Retrieval-Augmented Generation (RAG)

How agents find relevant context from large codebases, wikis, and documentation without hallucinating.

The Problem RAG Solves

LLMs have two fundamental limitations for dev automation:

Training cutoff — they don't know your codebase, your internal APIs, or your team's decisions
Context window limits — you can't fit an entire codebase into one prompt

RAG solves both: instead of baking knowledge into the model, you retrieve relevant context at query time and inject it into the prompt.

graph LR
    A[JIRA Ticket] --> B[Query Encoder]
    B --> C[Vector Search]
    D[Code Index + Docs] --> C
    C --> E[Top-K Relevant Chunks]
    E --> F[Augmented Prompt]
    A --> F
    F --> G[LLM]
    G --> H[Code Change / Analysis]

RAG vs. Fine-Tuning

Approach	Best For	Limitations
RAG	Frequently changing knowledge (code, docs, tickets)	Retrieval quality depends on chunks and embeddings
Fine-tuning	Consistent style, patterns, and behaviour	Expensive to retrain, static knowledge snapshot
RAG + Fine-tuning	Style from fine-tuning, facts from RAG	Most complex, highest quality

Default to RAG

For codebase knowledge, always use RAG. Code changes constantly — fine-tuning on a codebase snapshot is outdated the moment the next PR merges.

The RAG Pipeline

Stage	What Happens	Tools
1. Indexing	Split documents → embed → store in vector DB	LangChain, LlamaIndex, custom
2. Retrieval	Embed query → similarity search → top-K chunks	Pinecone, Weaviate, pgvector
3. Reranking	Re-score top-K chunks for relevance to the query	Cohere Rerank, cross-encoder
4. Augmentation	Inject retrieved chunks into the LLM prompt	LangChain PromptTemplate
5. Generation	LLM answers grounded in retrieved context	OpenAI, Anthropic, Ollama

What to Index for Dev Automation

Corpus	Content	Value
Source code	Java classes, interfaces, Spring beans	Find the right service and method
Test code	JUnit tests, Playwright scripts	Understand expected behaviour
OpenAPI specs	Swagger YAML/JSON	Understand API contracts
JIRA history	Past bugs, resolutions, ADRs	Find similar past issues
Confluence / docs	Architecture docs, runbooks	Understand system design
Git history	Commit messages, PR descriptions	Understand change intent

Retrieval Quality Factors

Good RAG is mostly about retrieval quality, not generation quality. The LLM is only as good as the context it receives.

Factor	Impact
Chunk size	Too small → context fragmented; too large → irrelevant content included
Chunk overlap	Overlap prevents splitting important context across boundaries
Embedding model quality	Code-specific models outperform general ones for source code
Metadata filtering	Filter by service name, language, file type before vector search
Hybrid search	Keyword + vector combined — best recall for uncommon identifiers
Reranking	A cross-encoder reranker dramatically improves top-5 relevance

→ Deep Dive: RAG Pipeline — Indexing strategies, chunking code, hybrid search
→ Deep Dive: Advanced RAG Patterns — HyDE, self-query, multi-hop retrieval

RAG for Code: Special Considerations

Code is not prose. Naive text chunking breaks class and method boundaries.

Language Construct	Recommended Chunk Boundary
Java class	One chunk per class file (or per public method for large classes)
Interface + implementations	Keep interface and its primary implementation in the same chunk
Spring Boot controller	One chunk per controller with all its endpoint methods
Playwright test	One chunk per `test()` or `describe()` block

AST-Based Chunking

Production code indexers parse the AST (Abstract Syntax Tree) to split at method/class boundaries rather than character count. Tools like Tree-sitter support this for Java, TypeScript, Python.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search