Sparse Vectors
Sparse vectors are high-dimensional vectors where most values are zero, focusing on specific term importance.
Dense vs Sparse
| Dense Vectors | Sparse Vectors |
|---|---|
| All values non-zero | Mostly zeros |
| 1024 dimensions | 30,000+ dimensions |
| Semantic meaning | Term importance |
| Similar meaning = similar vector | Shared terms = similar vector |
SPLADE
SPLADE (Sparse Lexical and Expansion) creates sparse vectors by:
- Term weighting - Assigns importance to each term
- Query expansion - Adds related terms automatically
Input: "machine learning"
Output: {
"machine": 0.8,
"learning": 0.9,
"ML": 0.6,
"artificial": 0.4,
"intelligence": 0.4,
"model": 0.3,
...
}Benefits
- Interpretable - You can see which terms matter
- Efficient - Only store non-zero values
- Expansion - Handles synonyms and related terms
- Complementary - Works well with dense vectors
Configuration
python
results = client.search(
query="...",
options={
"sparse_model": "splade-v3",
"expansion_factor": 1.2 # More expansion
}
)When Sparse Shines
- Rare terms: Technical jargon, product codes
- Exact matching: Names, IDs, specific phrases
- Low-resource languages: Where dense models struggle