Sparse Vectors

Sparse vectors are high-dimensional vectors where most values are zero, focusing on specific term importance.

Dense vs Sparse

Dense Vectors	Sparse Vectors
All values non-zero	Mostly zeros
1024 dimensions	30,000+ dimensions
Semantic meaning	Term importance
Similar meaning = similar vector	Shared terms = similar vector

SPLADE

SPLADE (Sparse Lexical and Expansion) creates sparse vectors by:

Term weighting - Assigns importance to each term
Query expansion - Adds related terms automatically

Input: "machine learning"
Output: {
  "machine": 0.8,
  "learning": 0.9,
  "ML": 0.6,
  "artificial": 0.4,
  "intelligence": 0.4,
  "model": 0.3,
  ...
}

Benefits

Interpretable - You can see which terms matter
Efficient - Only store non-zero values
Expansion - Handles synonyms and related terms
Complementary - Works well with dense vectors

Configuration

python

results = client.search(
    query="...",
    options={
        "sparse_model": "splade-v3",
        "expansion_factor": 1.2  # More expansion
    }
)

When Sparse Shines

Rare terms: Technical jargon, product codes
Exact matching: Names, IDs, specific phrases
Low-resource languages: Where dense models struggle