01 · AI & LLM Foundations
High-level overview of the AI landscape relevant to software development automation.
This section covers what you need to know about LLMs before building agentic systems.
The AI Landscape
| Category | What It Is | Examples |
|---|---|---|
| Machine Learning | Systems that learn from data to make predictions | Scikit-learn, XGBoost, time-series forecasting |
| Deep Learning | Neural networks with many layers, especially for perception | Image classification, speech recognition |
| Large Language Models (LLMs) | Deep learning models trained on text at massive scale | GPT-4, Claude, Gemini, LLaMA |
| Generative AI | AI that generates new content — text, code, images | ChatGPT, Copilot, Stable Diffusion |
| Agentic AI | AI that autonomously plans and executes multi-step tasks | AutoGPT, LangGraph agents, Devin |
Focus of This Guide
This guide focuses on LLMs and Agentic AI as applied to software development workflows. ML/deep learning fundamentals are assumed knowledge.
How LLMs Work — The 30-Second Version
An LLM is a next-token prediction machine trained on billions of text documents. Given input text (the prompt), it predicts the most likely continuation.
graph LR
A[Input Text] --> B[Tokenizer]
B --> C[Transformer Layers]
C --> D[Logits over vocabulary]
D --> E[Sampled Token]
E --> F[Appended to output]
F --> B
What makes LLMs powerful for automation:
- They can follow instructions expressed in natural language
- They can generate structured output (JSON, code, YAML)
- They have broad world knowledge baked in at training time
- They can reason step by step when prompted correctly
→ Deep Dive: How LLMs Work — Transformer architecture, attention, context windows, temperature
Key Concepts at a Glance
| Concept | What It Means | Why It Matters |
|---|---|---|
| Token | Unit of text the model processes (~¾ of a word) | Context window = max tokens in + out |
| Context Window | How much text the model can "see" at once | Limits how much code/docs an agent can process |
| Embedding | Numerical vector representation of text | Enables semantic search and RAG |
| Temperature | How random vs. deterministic the output is | Low temp = consistent code, high temp = creative |
| System Prompt | The instructions given to the model before user input | Controls agent persona and safety boundaries |
| Tool / Function Call | Model outputs a structured function invocation | How agents take actions in the real world |
| Grounding | Attaching external context (docs, code) to a prompt | Reduces hallucination, enables RAG |
LLMs in Software Development
LLMs can assist at every phase of the SDLC:
| SDLC Phase | AI Capability | Example |
|---|---|---|
| Requirements | Understand natural language specs | Read JIRA ticket, extract acceptance criteria |
| Design | Suggest architectural patterns | Recommend microservice structure given domain |
| Implementation | Generate code, refactor, explain | Write Spring Boot service layer from spec |
| Testing | Generate unit tests, analyze failures | Create JUnit tests, RCA a Playwright failure |
| Review | Summarize PR diffs, flag issues | Comment on security risks in changed code |
| Operations | Analyze logs, suggest fixes | Identify root cause from stack trace |
What LLMs Cannot Do (Without Help)
| Limitation | Solution |
|---|---|
| No access to your codebase | RAG + code indexing |
| Can't run code | Tool use (code interpreter, sandbox) |
| Hallucinate facts and APIs | Retrieval grounding + output validation |
| No memory across sessions | External memory stores (Redis, vector DB) |
| Can't call APIs | MCP Servers / function calling |
| Don't know recent events | Real-time retrieval tools |
LLM Provider Landscape
| Provider | Model Family | Strengths | API |
|---|---|---|---|
| OpenAI | GPT-4o, o1, o3 | Code generation, function calling, reasoning | REST, Python SDK |
| Anthropic | Claude 3.5 Sonnet, Claude 4 | Long context, precise instruction following, safety | REST, Python SDK |
| Gemini 1.5 Pro, 2.0 Flash | Very long context (1M tokens), multimodal | REST, Vertex AI | |
| Meta | LLaMA 3.x | Open source, self-hosted | Hugging Face, Ollama |
| Mistral | Mistral Large, Mixtral | Fast, multilingual, open weights | REST |
| Cohere | Command R+ | Strong RAG-optimized, reranking | REST |
Model Selection for Dev Automation
For code generation and reasoning tasks (like our JIRA→PR use case), Claude Sonnet or GPT-4o offer the best balance of instruction following, code quality, and context length. For cost-sensitive CI automation, Mistral or LLaMA 3 self-hosted are viable.
→ Deep Dive: How LLMs Work — Transformer internals, attention, context windows
→ Deep Dive: Embeddings & Vector Search — Semantic search, FAISS, pgvector
→ Deep Dive: Prompt Engineering — System prompts, few-shot, chain-of-thought