Dense Retrieval: Semantic Search with Embeddings

Dense retrieval uses embeddings to find documents with similar meaning to the query.

You've already learned the foundations:

Embeddings convert text → vectors
Cosine similarity measures vector similarity
HNSW indexes find similar vectors quickly

This page applies those concepts to retrieval.

How Dense Retrieval Works

User Query: "What is the status of my order?"
    ↓
[Embedding Model: Sentence-BERT]
    ↓
Query Embedding: [0.12, -0.45, 0.73, ..., 0.02]
    ↓
[Vector Database Index: HNSW]
    ↓
Step 1: Find 1000 similar vectors
Step 2: Compute exact cosine similarity to top 1000
Step 3: Return top-10 results
    ↓
Top Results:
1. "Your order #2401 will arrive Tuesday" (0.87 similarity)
2. "Track your order status here" (0.85 similarity)
3. "This is a cat picture" (0.23 similarity)

Pros of Dense Retrieval

✅ Captures meaning: "What is my order status?" matches documents about "tracking" and "shipment"
✅ Semantic understanding: Finds documents with similar intent, not just keywords
✅ Cross-lingual: Works across languages (for multilingual models)
✅ Flexible: Works for open-ended questions

Cons of Dense Retrieval

❌ Terrible for exact matches: Order #1766 ≈ Order #1767 in embedding space
❌ Ambiguous queries fail: "Bank" matches both financial and river banks
❌ Limited context: Embedding model has maximum input length (usually 512-2048 tokens)
❌ Requires training data: Need domain-specific embedding model for best results

Example: Simple Dense Retrieval

from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.paginar import cosine_similarity

# Step 1: Create embeddings for all documents
documents = [
    "Your order #1766 has been confirmed",
    "Your order #1767 is being processed",
    "Your order #1765 will arrive tomorrow",
    "Here's a recipe for cat food",
    "The bank is open 9am-5pm weekdays"
]

model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = model.encode(documents)

# Step 2: Embed query
query = "What about order #1766"
query_embedding = model.encode(query)

# Step 3: Find similar documents
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
top_k = 3
top_indices = np.argsort(-similarities)[:top_k]

# Step 4: Return results
for i in top_indices:
    print(f"{documents[i]}: {similarities[i]:.3f}")

Output:

Your order #1766 has been confirmed: 0.892
Your order #1767 is being processed: 0.883
Your order #1765 will arrive tomorrow: 0.819

Problem: Orders #1767 and #1765 are returned with almost the same score as #1766!

When to Use Dense Retrieval

Good for: - Natural language questions - Semantic matching - When word choice varies widely

Bad for: - Exact identifiers (order numbers, SKUs, IDs) - Structured data lookup - Ambiguous queries

Dense Retrieval with Vector Database

In production, use a vector database:

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

client = QdrantClient(":memory:")
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create collection and add documents
documents = ["...", "..."]
embeddings = model.encode(documents)

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

# Add with IDs and metadata
for i, (doc, emb) in enumerate(zip(documents, embeddings)):
    client.upsert(
        collection_name="documents",
        points=[PointStruct(id=i, vector=emb, payload={"text": doc})]
    )

# Query
query_embedding = model.encode("What about order #1766")
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    limit=10
)

for result in results:
    print(f"{result.payload['text']}: {result.score:.3f}")

Dense + Multiple Embeddings Strategy

Some systems use multiple embedding models for different aspects:

Document: "Order #1766 shipped via FedEx"

Embedding Model 1 (semantic): [...]  # General semantics
Embedding Model 2 (order-specific): [...]  # Trained on order docs
Embedding Model 3 (logistics): [...]  # Trained on shipping docs

Query: "Where's my package"
→ Search with Model 3 (logistics-optimized)

This is more complex but improves quality for specialized domains.

Embedding Model Choice Matters

from sentence_transformers import SentenceTransformer

models_to_try = [
    'all-MiniLM-L6-v2',  # General purpose
    'all-mpnet-base-v2',  # Slightly better quality
    'multi-qa-MiniLM-L6-cos-v1',  # Optimized for Q&A
    'BAAI/bge-small-en-v1.5',  # New, good quality
]

query = "What's my order status?"

for model_name in models_to_try:
    model = SentenceTransformer(model_name)
    query_emb = model.encode(query)

    # The embeddings will be different!
    # Some models optimize better for Q&A

Reranking Results

Dense retrieval alone might return 100 documents with similar scores. Reranking uses a more powerful (but slower) model to rank them:

Dense Retrieval (fast, finds 100 candidates):
├─ Document A: 0.87
├─ Document B: 0.85
├─ Document C: 0.84
└─ ... 97 more

Cross-Encoder Reranking (slow, ranks top 10):
1. Document A: 0.95  (was top 1)
2. Document C: 0.92  (was #3)
3. Document B: 0.88  (was #2)

Cross-encoders often reorder results with higher precision.

Summary

Aspect	Dense Retrieval
What	Embedding-based semantic search
Similarity	Cosine similarity between vectors
Speed	⚡ Fast (HNSW indexing)
Strength	Captures meaning & synonyms
Weakness	Treats similar IDs as equivalent
Best for	Natural language questions

Key Limitation for This Tutorial

Dense retrieval alone cannot solve the Order #1766 problem.

You need additional strategies:

Hybrid Search — Combine with sparse/keyword
Metadata Filtering — Filter by exact ID
Chunking Strategies — Preserve IDs in text

Next Steps

→ Sparse Retrieval: BM25 — Learn how keyword search finds exact matches

→ Hybrid Search — The solution combining both