Lab 5: The Exact Match Problem – Why Semantic Search Fails¶
Level: Intermediate | Duration: 2 hours
Objective¶
Understand the fundamental limitation of semantic embeddings and why they fail for exact matches.
The Problem¶
User searches for: "Order #1766"
Semantic closest match: "Order #1767"
Why? Both embeddings are nearly identical because the numbers look alike to the model!
What You'll Learn¶
- Create datasets that expose the problem
- Visualize why embeddings fail on numbers
- Understand semantic vs exact matching
- Recognize when to use keyword search
- Introduce solution: hybrid search
- Analyze failure cases
Real-World Scenarios Where This Happens¶
- Product SKUs: "SKU-001" vs "SKU-002"
- Order IDs: Different IDs but similar values
- Medical codes: ICD codes, diagnosis codes
- Reference numbers: Invoice numbers, tracking IDs