Models

sgrep ships with Model2Vec by default — an 8M parameter static embedding model with 256 dimensions. It runs on CPU, requires no GPU, and downloads as a 7.5MB int8-quantized binary.

Available Models

Alias	Model	Dims	Size	Notes
`model2vec` (default)	minishlab/potion-base-8M	256	7.5MB (int8)	Best balance of speed and quality
`code-search`	henrikalbihn/sgrep-code-v1	256	~8MB (int8)	Fine-tuned for code search
`bge-small`	BAAI/bge-small-en-v1.5	384	33MB	Higher quality, slower
`clip`	openai/clip-vit-base-patch32	512	350MB	Text + image search
`simple`	—	—	0MB	No download, text-only heuristic (for testing)

Switching Models

sgrep --model bge-small "query" src/
sgrep --model code-search "query" src/

How It Works

Model2Vec is a static embedding model — no attention layers, no transformer inference. Each token is looked up in an embedding table, then mean-pooled and normalized. This makes it:

Fast: sub-10ms for typical queries
Small: 7.5MB int8 vs 33MB+ for transformer models
Offline: no API calls, no network after first download

Int8 Quantization

All models are automatically quantized to int8 on download. This reduces size by ~4x with minimal quality loss:

f32 embeddings: ~30MB → int8: ~7.5MB
Quality impact: less than 1% on retrieval benchmarks
The quantization scale factor is stored in config.json alongside the model

HuggingFace models that ship as f32 are quantized at download time — you don’t need to do anything.

Model Cache

Models auto-download on first use and cache in ~/.cache/sgrep/models/. To use a model from a local directory:

sgrep --model model2vec --model-path ./my-model/ "query" src/

The model directory should contain model.safetensors, tokenizer.json, and optionally config.json.