Lab 1: Vector Math & Distance Metrics¶

Level: Foundations | Duration: 1.5 hours

Objective¶

Understand vectors and distance metrics from first principles. Build geometric intuition for why embeddings work.

What You'll Learn¶

Vector operations: magnitude, dot product, normalization
Distance metrics: Euclidean, Manhattan, Cosine similarity
Why cosine similarity is perfect for text embeddings
Visualize vectors in 2D and 3D space
Computational complexity of different metrics

Prerequisites¶

Lab 0 completed
Basic Python knowledge
Comfortable with mathematical notation

Core Concepts Refresher¶

A vector is an ordered list of numbers:

v = [3, 4] (2D vector in 2D space)
v = [1, 2, 3] (3D vector)
v = [v₁, v₂, ..., vₙ] (n-dimensional vector)

In [ ]:

Copied!





# Exercise 1.1: Implement Vector Operations from Scratch (No NumPy!)
import math

class Vector:
    """A simple 2D/3D vector implementation from scratch"""
    
    def __init__(self, components):
        self.components = list(components)
        self.dim = len(components)
    
    def magnitude(self):
        """Calculate the length (magnitude) of the vector"""
        return math.sqrt(sum(x**2 for x in self.components))
    
    def dot_product(self, other):
        """Calculate dot product with another vector"""
        if self.dim != other.dim:
            raise ValueError("Vectors must have same dimension")
        return sum(a * b for a, b in zip(self.components, other.components))
    
    def normalize(self):
        """Return normalized version (magnitude = 1)"""
        mag = self.magnitude()
        if mag == 0:
            raise ValueError("Cannot normalize zero vector")
        return Vector([x / mag for x in self.components])
    
    def cosine_similarity(self, other):
        """Calculate cosine similarity with another vector"""
        dot = self.dot_product(other)
        mag1 = self.magnitude()
        mag2 = other.magnitude()
        if mag1 == 0 or mag2 == 0:
            return 0
        return dot / (mag1 * mag2)
    
    def __repr__(self):
        return f"Vector({self.components})"

# Test the implementation
v1 = Vector([3, 4])
v2 = Vector([1, 0])

print("Vector Operations Demo")
print("=" * 50)
print(f"v1 = {v1}")
print(f"v2 = {v2}\n")

print(f"Magnitude of v1: {v1.magnitude()}")
print(f"  (This is the length of the vector, √(3² + 4²) = 5)\n")

print(f"Dot product (v1 · v2): {v1.dot_product(v2)}")
print(f"  (This is 3*1 + 4*0 = 3)\n")

print(f"v1 normalized: {v1.normalize()}")
print(f"  (Unit vector pointing in same direction)\n")

print(f"Cosine similarity: {v1.cosine_similarity(v2):.4f}")
print(f"  (Ranges from -1 to 1, measures angle between vectors)")
# Exercise 1.1: Implement Vector Operations from Scratch (No NumPy!)
import math

class Vector:
    """A simple 2D/3D vector implementation from scratch"""
    
    def __init__(self, components):
        self.components = list(components)
        self.dim = len(components)
    
    def magnitude(self):
        """Calculate the length (magnitude) of the vector"""
        return math.sqrt(sum(x**2 for x in self.components))
    
    def dot_product(self, other):
        """Calculate dot product with another vector"""
        if self.dim != other.dim:
            raise ValueError("Vectors must have same dimension")
        return sum(a * b for a, b in zip(self.components, other.components))
    
    def normalize(self):
        """Return normalized version (magnitude = 1)"""
        mag = self.magnitude()
        if mag == 0:
            raise ValueError("Cannot normalize zero vector")
        return Vector([x / mag for x in self.components])
    
    def cosine_similarity(self, other):
        """Calculate cosine similarity with another vector"""
        dot = self.dot_product(other)
        mag1 = self.magnitude()
        mag2 = other.magnitude()
        if mag1 == 0 or mag2 == 0:
            return 0
        return dot / (mag1 * mag2)
    
    def __repr__(self):
        return f"Vector({self.components})"

# Test the implementation
v1 = Vector([3, 4])
v2 = Vector([1, 0])

print("Vector Operations Demo")
print("=" * 50)
print(f"v1 = {v1}")
print(f"v2 = {v2}\n")

print(f"Magnitude of v1: {v1.magnitude()}")
print(f"  (This is the length of the vector, √(3² + 4²) = 5)\n")

print(f"Dot product (v1 · v2): {v1.dot_product(v2)}")
print(f"  (This is 3*1 + 4*0 = 3)\n")

print(f"v1 normalized: {v1.normalize()}")
print(f"  (Unit vector pointing in same direction)\n")

print(f"Cosine similarity: {v1.cosine_similarity(v2):.4f}")
print(f"  (Ranges from -1 to 1, measures angle between vectors)")

Section 2: Distance Metrics Comparison¶

Different ways to measure distance between vectors, each with different properties:

Metric	Formula	Best For	Range
Euclidean	√Σ(aᵢ - bᵢ)²	Physical distances, clustering	[0, ∞)
Manhattan	Σ\|aᵢ - bᵢ\|	Grid-like spaces, robust to outliers	[0, ∞)
Cosine Similarity	(a · b) / (\|a\| × \|b\|)	Text embeddings, high-dimensional data	[-1, 1]

Key Insight: For text embeddings (384-dimensional vectors), cosine similarity works best because:

Only cares about direction, not magnitude
Invariant to document length
Computationally efficient
Interpretable (0.9 = very similar, 0.5 = somewhat related, 0.1 = different)

In [6]:

Copied!





# Exercise 2.1: Implement Distance Metrics
import math

def euclidean_distance(v1, v2):
    """L2 distance"""
    if len(v1) != len(v2):
        raise ValueError("Vectors must have same dimension")
    return math.sqrt(sum((a - b)**2 for a, b in zip(v1, v2)))

def manhattan_distance(v1, v2):
    """L1 distance"""
    if len(v1) != len(v2):
        raise ValueError("Vectors must have same dimension")
    return sum(abs(a - b) for a, b in zip(v1, v2))

def cosine_similarity(v1, v2):
    """Range [0, 1] for normalized vectors, [-1, 1] in general"""
    dot = sum(a * b for a, b in zip(v1, v2))
    mag1 = math.sqrt(sum(a**2 for a in v1))
    mag2 = math.sqrt(sum(b**2 for b in v2))
    if mag1 == 0 or mag2 == 0:
        return 0
    return dot / (mag1 * mag2)

# Compare metrics on sample vectors
test_pairs = [
    ([1, 0], [0, 1], "Orthogonal (perpendicular)"),
    ([1, 0], [1, 0.1], "Nearly same direction"),
    ([1, 0], [2, 0], "Same direction, different magnitude"),
    ([1, 0], [-1, 0], "Opposite direction"),
]

print("Distance Metrics Comparison")
print("=" * 80)
for v1, v2, desc in test_pairs:
    print(f"\n{desc}")
    print(f"  v1 = {v1}, v2 = {v2}")
    print(f"  Euclidean:       {euclidean_distance(v1, v2):.4f}")
    print(f"  Manhattan:       {manhattan_distance(v1, v2):.4f}")
    print(f"  Cosine Simil.:   {cosine_similarity(v1, v2):+.4f}")
    # Convert cosine to distance (1 - similarity)
    print(f"  Cosine Distance: {1 - cosine_similarity(v1, v2):.4f}")
# Exercise 2.1: Implement Distance Metrics
import math

def euclidean_distance(v1, v2):
    """L2 distance"""
    if len(v1) != len(v2):
        raise ValueError("Vectors must have same dimension")
    return math.sqrt(sum((a - b)**2 for a, b in zip(v1, v2)))

def manhattan_distance(v1, v2):
    """L1 distance"""
    if len(v1) != len(v2):
        raise ValueError("Vectors must have same dimension")
    return sum(abs(a - b) for a, b in zip(v1, v2))

def cosine_similarity(v1, v2):
    """Range [0, 1] for normalized vectors, [-1, 1] in general"""
    dot = sum(a * b for a, b in zip(v1, v2))
    mag1 = math.sqrt(sum(a**2 for a in v1))
    mag2 = math.sqrt(sum(b**2 for b in v2))
    if mag1 == 0 or mag2 == 0:
        return 0
    return dot / (mag1 * mag2)

# Compare metrics on sample vectors
test_pairs = [
    ([1, 0], [0, 1], "Orthogonal (perpendicular)"),
    ([1, 0], [1, 0.1], "Nearly same direction"),
    ([1, 0], [2, 0], "Same direction, different magnitude"),
    ([1, 0], [-1, 0], "Opposite direction"),
]

print("Distance Metrics Comparison")
print("=" * 80)
for v1, v2, desc in test_pairs:
    print(f"\n{desc}")
    print(f"  v1 = {v1}, v2 = {v2}")
    print(f"  Euclidean:       {euclidean_distance(v1, v2):.4f}")
    print(f"  Manhattan:       {manhattan_distance(v1, v2):.4f}")
    print(f"  Cosine Simil.:   {cosine_similarity(v1, v2):+.4f}")
    # Convert cosine to distance (1 - similarity)
    print(f"  Cosine Distance: {1 - cosine_similarity(v1, v2):.4f}")

Distance Metrics Comparison
================================================================================

Orthogonal (perpendicular)
  v1 = [1, 0], v2 = [0, 1]
  Euclidean:       1.4142
  Manhattan:       2.0000
  Cosine Simil.:   +0.0000
  Cosine Distance: 1.0000

Nearly same direction
  v1 = [1, 0], v2 = [1, 0.1]
  Euclidean:       0.1000
  Manhattan:       0.1000
  Cosine Simil.:   +0.9950
  Cosine Distance: 0.0050

Same direction, different magnitude
  v1 = [1, 0], v2 = [2, 0]
  Euclidean:       1.0000
  Manhattan:       1.0000
  Cosine Simil.:   +1.0000
  Cosine Distance: 0.0000

Opposite direction
  v1 = [1, 0], v2 = [-1, 0]
  Euclidean:       2.0000
  Manhattan:       2.0000
  Cosine Simil.:   -1.0000
  Cosine Distance: 2.0000

In [8]:

Copied!





# Exercise 2.2: Visualize Vectors in 2D Space
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Plot 1: Basic vectors
ax = axes[0]
v1, v2 = [1, 2], [2, 1]
ax.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='red', label='v1')
ax.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='blue', label='v2')
ax.set_xlim(-0.5, 3), ax.set_ylim(-0.5, 3)
ax.grid(True), ax.set_aspect('equal')
ax.legend(), ax.set_title(f'Vectors\ncos_similarity = {cosine_similarity(v1, v2):.3f}')
ax.set_xlabel('x'), ax.set_ylabel('y')

# Plot 2: Distance illustration
ax = axes[1]
points = [(0, 0), (3, 4), (4, 2), (1, 3)]
colors = ['red', 'white', 'green', 'orange']
for (x, y), c, label in zip(points, colors, ['Origin', 'v1', 'v2', 'v3']):
    ax.plot(x, y, 'o', color=c, markersize=10, label=label)
# Add lines showing distances
ax.plot([points[1][0], points[2][0]], [points[1][1], points[2][1]], 'k--', alpha=0.5)
ax.set_xlim(-1, 5), ax.set_ylim(-1, 5)
ax.grid(True), ax.set_aspect('equal')
ax.legend(), ax.set_title(f'Distances\nEuclidean = {euclidean_distance([3,4],[4,2]):.2f}')
ax.set_xlabel('x'), ax.set_ylabel('y')

# Plot 3: Cosine similarity visualization
ax = axes[2]
angles = np.linspace(0, 2*np.pi, 100)
similarity_scores = []
for angle in angles:
    v_angle = [np.cos(angle), np.sin(angle)]
    v_ref = [1, 0]
    similarity_scores.append(cosine_similarity(v_ref, v_angle))

ax.plot(np.degrees(angles), similarity_scores, 'o-', markersize=3)
ax.axhline(y=0, color='k', linestyle='-', alpha=0.3)
ax.axhline(y=0.5, color='g', linestyle='--', alpha=0.5, label='0.5 threshold')
ax.fill_between(np.degrees(angles), 0.5, 1, alpha=0.2, color='green')
ax.set_xlabel('Angle (degrees)'), ax.set_ylabel('Cosine Similarity')
ax.set_title('Cosine Similarity vs Angle'), ax.grid(True)
ax.legend()

plt.tight_layout()
plt.show()

print("\n📊 Key Observations:")
print("  - Cosine similarity is 1 when vectors point in same direction (0°)")
print("  - Cosine similarity is 0 when vectors are perpendicular (90°)")
print("  - Cosine similarity is -1 when vectors point opposite (180°)")
# Exercise 2.2: Visualize Vectors in 2D Space
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Plot 1: Basic vectors
ax = axes[0]
v1, v2 = [1, 2], [2, 1]
ax.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='red', label='v1')
ax.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='blue', label='v2')
ax.set_xlim(-0.5, 3), ax.set_ylim(-0.5, 3)
ax.grid(True), ax.set_aspect('equal')
ax.legend(), ax.set_title(f'Vectors\ncos_similarity = {cosine_similarity(v1, v2):.3f}')
ax.set_xlabel('x'), ax.set_ylabel('y')

# Plot 2: Distance illustration
ax = axes[1]
points = [(0, 0), (3, 4), (4, 2), (1, 3)]
colors = ['red', 'white', 'green', 'orange']
for (x, y), c, label in zip(points, colors, ['Origin', 'v1', 'v2', 'v3']):
    ax.plot(x, y, 'o', color=c, markersize=10, label=label)
# Add lines showing distances
ax.plot([points[1][0], points[2][0]], [points[1][1], points[2][1]], 'k--', alpha=0.5)
ax.set_xlim(-1, 5), ax.set_ylim(-1, 5)
ax.grid(True), ax.set_aspect('equal')
ax.legend(), ax.set_title(f'Distances\nEuclidean = {euclidean_distance([3,4],[4,2]):.2f}')
ax.set_xlabel('x'), ax.set_ylabel('y')

# Plot 3: Cosine similarity visualization
ax = axes[2]
angles = np.linspace(0, 2*np.pi, 100)
similarity_scores = []
for angle in angles:
    v_angle = [np.cos(angle), np.sin(angle)]
    v_ref = [1, 0]
    similarity_scores.append(cosine_similarity(v_ref, v_angle))

ax.plot(np.degrees(angles), similarity_scores, 'o-', markersize=3)
ax.axhline(y=0, color='k', linestyle='-', alpha=0.3)
ax.axhline(y=0.5, color='g', linestyle='--', alpha=0.5, label='0.5 threshold')
ax.fill_between(np.degrees(angles), 0.5, 1, alpha=0.2, color='green')
ax.set_xlabel('Angle (degrees)'), ax.set_ylabel('Cosine Similarity')
ax.set_title('Cosine Similarity vs Angle'), ax.grid(True)
ax.legend()

plt.tight_layout()
plt.show()

print("\n📊 Key Observations:")
print("  - Cosine similarity is 1 when vectors point in same direction (0°)")
print("  - Cosine similarity is 0 when vectors are perpendicular (90°)")
print("  - Cosine similarity is -1 when vectors point opposite (180°)")

No description has been provided for this image

📊 Key Observations:
  - Cosine similarity is 1 when vectors point in same direction (0°)
  - Cosine similarity is 0 when vectors are perpendicular (90°)
  - Cosine similarity is -1 when vectors point opposite (180°)

Challenge Exercise (Optional)¶

Problem: Given vectors v1 = [1, 2, 3] and v2 = [4, 5, 6], calculate:

Magnitude of each vector
Normalized versions of each vector
Dot product
All three distance metrics

Then predict which metric would work best for finding similar documents.

Summary & Key Takeaways¶

✓ Vectors are ordered lists of numbers (coordinates in space)

✓ Magnitude is the length of a vector: √(sum of squares)

✓ Dot product measures how aligned two vectors are

✓ Cosine similarity is BEST for text embeddings because:

Only cares about direction (angle), not scale
Handles text of different lengths
Computationally efficient

✓ Why embeddings work:

Similar texts → similar vectors
Different texts → different vectors
Distance metric quantifies similarity

Real-World Application¶

When you ask your RAG system:

"What's the capital of France?"

The system:

Converts "capital of France" → embedding (384-dimensional vector)
Compares it to stored document embeddings using cosine similarity
Returns documents with highest similarity scores
LLM writes answer based on retrieved texts

The magic is that cosine similarity correctly identifies related documents even with different wording!

Lab 1 Complete! ✅

You now understand the mathematical foundation of embeddings and why cosine similarity works. Ready for Lab 2: Creating real text embeddings.