01 · AI & LLM Foundations

High-level overview of the AI landscape relevant to software development automation.
This section covers what you need to know about LLMs before building agentic systems.

The AI Landscape

Category	What It Is	Examples
Machine Learning	Systems that learn from data to make predictions	Scikit-learn, XGBoost, time-series forecasting
Deep Learning	Neural networks with many layers, especially for perception	Image classification, speech recognition
Large Language Models (LLMs)	Deep learning models trained on text at massive scale	GPT-4, Claude, Gemini, LLaMA
Generative AI	AI that generates new content — text, code, images	ChatGPT, Copilot, Stable Diffusion
Agentic AI	AI that autonomously plans and executes multi-step tasks	AutoGPT, LangGraph agents, Devin

Focus of This Guide

This guide focuses on LLMs and Agentic AI as applied to software development workflows. ML/deep learning fundamentals are assumed knowledge.

How LLMs Work — The 30-Second Version

An LLM is a next-token prediction machine trained on billions of text documents. Given input text (the prompt), it predicts the most likely continuation.

graph LR
    A[Input Text] --> B[Tokenizer]
    B --> C[Transformer Layers]
    C --> D[Logits over vocabulary]
    D --> E[Sampled Token]
    E --> F[Appended to output]
    F --> B

What makes LLMs powerful for automation:

They can follow instructions expressed in natural language
They can generate structured output (JSON, code, YAML)
They have broad world knowledge baked in at training time
They can reason step by step when prompted correctly

→ Deep Dive: How LLMs Work — Transformer architecture, attention, context windows, temperature

Key Concepts at a Glance

Concept	What It Means	Why It Matters
Token	Unit of text the model processes (~¾ of a word)	Context window = max tokens in + out
Context Window	How much text the model can "see" at once	Limits how much code/docs an agent can process
Embedding	Numerical vector representation of text	Enables semantic search and RAG
Temperature	How random vs. deterministic the output is	Low temp = consistent code, high temp = creative
System Prompt	The instructions given to the model before user input	Controls agent persona and safety boundaries
Tool / Function Call	Model outputs a structured function invocation	How agents take actions in the real world
Grounding	Attaching external context (docs, code) to a prompt	Reduces hallucination, enables RAG

LLMs in Software Development

LLMs can assist at every phase of the SDLC:

SDLC Phase	AI Capability	Example
Requirements	Understand natural language specs	Read JIRA ticket, extract acceptance criteria
Design	Suggest architectural patterns	Recommend microservice structure given domain
Implementation	Generate code, refactor, explain	Write Spring Boot service layer from spec
Testing	Generate unit tests, analyze failures	Create JUnit tests, RCA a Playwright failure
Review	Summarize PR diffs, flag issues	Comment on security risks in changed code
Operations	Analyze logs, suggest fixes	Identify root cause from stack trace

What LLMs Cannot Do (Without Help)

Limitation	Solution
No access to your codebase	RAG + code indexing
Can't run code	Tool use (code interpreter, sandbox)
Hallucinate facts and APIs	Retrieval grounding + output validation
No memory across sessions	External memory stores (Redis, vector DB)
Can't call APIs	MCP Servers / function calling
Don't know recent events	Real-time retrieval tools

LLM Provider Landscape

Provider	Model Family	Strengths	API
OpenAI	GPT-4o, o1, o3	Code generation, function calling, reasoning	REST, Python SDK
Anthropic	Claude 3.5 Sonnet, Claude 4	Long context, precise instruction following, safety	REST, Python SDK
Google	Gemini 1.5 Pro, 2.0 Flash	Very long context (1M tokens), multimodal	REST, Vertex AI
Meta	LLaMA 3.x	Open source, self-hosted	Hugging Face, Ollama
Mistral	Mistral Large, Mixtral	Fast, multilingual, open weights	REST
Cohere	Command R+	Strong RAG-optimized, reranking	REST

Model Selection for Dev Automation

For code generation and reasoning tasks (like our JIRA→PR use case), Claude Sonnet or GPT-4o offer the best balance of instruction following, code quality, and context length. For cost-sensitive CI automation, Mistral or LLaMA 3 self-hosted are viable.

→ Deep Dive: How LLMs Work — Transformer internals, attention, context windows
→ Deep Dive: Embeddings & Vector Search — Semantic search, FAISS, pgvector
→ Deep Dive: Prompt Engineering — System prompts, few-shot, chain-of-thought

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search