What Are Embeddings?
An embedding is a representation of text as a vector of numbers. Instead of storing text directly, we store numbers that capture its meaning.
The Intuition
Imagine a 2D space (like a piece of paper) where:
- Words with similar meanings are placed close to each other
- Words with different meanings are placed far apart
For example:
In reality, embeddings are in very high dimensions (hundreds or thousands), but the principle is the same: nearby vectors = similar meanings.
Concrete Example
Let's say we have three sentences:
- "The cat sat on the mat"
- "A feline rested on a rug"
- "The dog chased the ball"
An embedding model might produce:
| Sentence | Embedding (384-dimensional) |
|---|---|
| Sentence 1 | [0.12, -0.45, 0.73, ..., 0.02] |
| Sentence 2 | [0.11, -0.43, 0.71, ..., 0.03] |
| Sentence 3 | [-0.31, 0.28, -0.12, ..., 0.89] |
Notice: - Sentences 1 and 2 have vectors that are very similar (they describe similar situations) - Sentence 3 has a vector quite different (different scenario)
We can measure this with cosine similarity (from the linear algebra section):
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('all-MiniLM-L6-v2')
sent1 = "The cat sat on the mat"
sent2 = "A feline rested on a rug"
sent3 = "The dog chased the ball"
emb1 = model.encode(sent1)
emb2 = model.encode(sent2)
emb3 = model.encode(sent3)
print(f"emb1 shape: {emb1.shape}") # (384,)
print(f"Similarity 1-2: {np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2)):.3f}")
print(f"Similarity 1-3: {np.dot(emb1, emb3) / (np.linalg.norm(emb1) * np.linalg.norm(emb3)):.3f}")
Why Embeddings Capture Meaning
Neural networks are trained on massive amounts of text. During training, the network learns that:
- Words appearing in similar contexts should have similar embeddings
- "King" and "Queen" appear in similar contexts
- Therefore, their embeddings are close
- We can perform analogy: "King - Man + Woman ≈ Queen"
This is a powerful emergent property—nobody explicitly told the network to encode these relationships!
Limitations: The Order #1766 Problem
Here's the critical issue for exact match search:
From an embedding model's perspective: - "Order #1766" appears in contexts like "your order", "confirmed order", "order total" - "Order #1767" appears in very similar contexts - Therefore, the embedding treats them as nearly identical!
But they're NOT the same! They're different orders.
The embedding model is doing exactly what it's designed to do—capture semantic similarity. But for structured data like order numbers, we need different approaches. (This is solved in The Exact Match Problem.)
Example: What a Real Embedding Looks Like
Using a real embedding model:
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('all-MiniLM-L6-v2') # 384-dimensional
text = "What is the meaning of life?"
embedding = model.encode(text)
print(f"Embedding shape: {embedding.shape}") # (384,)
print(f"First 10 values: {embedding[:10]}")
# Output: [-0.08 0.23 -0.15 0.42 0.01 -0.09 0.67 -0.32 0.11 0.03]
print(f"Min: {embedding.min():.3f}, Max: {embedding.max():.3f}")
# Output: Min: -0.999, Max: 0.893
# Magnitude after normalization
magnitude = np.linalg.norm(embedding)
print(f"Magnitude: {magnitude:.3f}") # Should be ~1.0 for normalized embeddings
Types of Embeddings
- Word embeddings (Word2Vec, GloVe): Single words → vectors
- Sentence embeddings (Sentence-BERT): Entire sentences → vectors
- Document embeddings: Entire documents → vectors
- Contextual embeddings (BERT, GPT): Same word has different vectors in different contexts
For RAG systems, we typically use document/passage embeddings (short documents or chunks).
Dimensions and Trade-offs
| Model | Dimensions | Speed | Quality | Use Case |
|---|---|---|---|---|
| ONNX mini models | 96 | Very Fast | Good | Real-time, resource-constrained |
| MiniLM | 384 | Fast | Very Good | Most RAG systems |
| BGE | 768 | Medium | Excellent | Production systems |
| OpenAI text-embedding-3-large | 3072 | Slow | State-of-art | When money/latency not a concern |
Higher dimensions = more expressive (better quality) but slower and more storage.
How to Create Embeddings
Option 1: Using Pre-trained Models (Recommended)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
documents = [
"This is the first document",
"Here is the second document",
"And the third one"
]
embeddings = model.encode(documents)
print(embeddings.shape) # (3, 384)
Option 2: Using OpenAI API
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="Strawberry shortcake"
)
embedding = response.data[0].embedding
print(f"Embedding length: {len(embedding)}") # 1536
Option 3: Using Hugging Face Transformers
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
text = "This is a sample text"
inputs = tokenizer(text, return_tensors="pt")
# Get embeddings
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :] # Take [CLS] token
print(embeddings.shape)
Summary
| Concept | Definition |
|---|---|
| Embedding | Vector of numbers representing text meaning |
| Embedding dimension | Length of vector (e.g., 384) |
| Cosine similarity | How similar two embeddings are (0-1) |
| Context | Texts with similar context get similar embeddings |
| Limitation | Treats "Order #1766" and "Order #1767" as similar |
Next Steps
Ready to understand how embedding models work? → Embedding Models
Or skip ahead to Distance Metrics if you want to learn how to use embeddings for search.