Models
sgrep ships with Model2Vec by default — an 8M parameter static embedding model with 256 dimensions. It runs on CPU, requires no GPU, and downloads as a 7.5MB int8-quantized binary.
Available Models
| Alias | Model | Dims | Size | Notes |
|---|---|---|---|---|
model2vec (default) | minishlab/potion-base-8M | 256 | 7.5MB (int8) | Best balance of speed and quality |
code-search | henrikalbihn/sgrep-code-v1 | 256 | ~8MB (int8) | Fine-tuned for code search |
bge-small | BAAI/bge-small-en-v1.5 | 384 | 33MB | Higher quality, slower |
clip | openai/clip-vit-base-patch32 | 512 | 350MB | Text + image search |
simple | — | — | 0MB | No download, text-only heuristic (for testing) |
Switching Models
sgrep --model bge-small "query" src/
sgrep --model code-search "query" src/
How It Works
Model2Vec is a static embedding model — no attention layers, no transformer inference. Each token is looked up in an embedding table, then mean-pooled and normalized. This makes it:
- Fast: sub-10ms for typical queries
- Small: 7.5MB int8 vs 33MB+ for transformer models
- Offline: no API calls, no network after first download
Int8 Quantization
All models are automatically quantized to int8 on download. This reduces size by ~4x with minimal quality loss:
- f32 embeddings: ~30MB → int8: ~7.5MB
- Quality impact: less than 1% on retrieval benchmarks
- The quantization scale factor is stored in
config.jsonalongside the model
HuggingFace models that ship as f32 are quantized at download time — you don’t need to do anything.
Model Cache
Models auto-download on first use and cache in ~/.cache/sgrep/models/. To use a model from a local directory:
sgrep --model model2vec --model-path ./my-model/ "query" src/
The model directory should contain model.safetensors, tokenizer.json, and optionally config.json.
Was this page helpful?