Models

sgrep ships with Model2Vec by default — an 8M parameter static embedding model with 256 dimensions. It runs on CPU, requires no GPU, and downloads as a 7.5MB int8-quantized binary.

Available Models

AliasModelDimsSizeNotes
model2vec (default)minishlab/potion-base-8M2567.5MB (int8)Best balance of speed and quality
code-searchhenrikalbihn/sgrep-code-v1256~8MB (int8)Fine-tuned for code search
bge-smallBAAI/bge-small-en-v1.538433MBHigher quality, slower
clipopenai/clip-vit-base-patch32512350MBText + image search
simple0MBNo download, text-only heuristic (for testing)

Switching Models

sgrep --model bge-small "query" src/
sgrep --model code-search "query" src/

How It Works

Model2Vec is a static embedding model — no attention layers, no transformer inference. Each token is looked up in an embedding table, then mean-pooled and normalized. This makes it:

  • Fast: sub-10ms for typical queries
  • Small: 7.5MB int8 vs 33MB+ for transformer models
  • Offline: no API calls, no network after first download

Int8 Quantization

All models are automatically quantized to int8 on download. This reduces size by ~4x with minimal quality loss:

  • f32 embeddings: ~30MB → int8: ~7.5MB
  • Quality impact: less than 1% on retrieval benchmarks
  • The quantization scale factor is stored in config.json alongside the model

HuggingFace models that ship as f32 are quantized at download time — you don’t need to do anything.

Model Cache

Models auto-download on first use and cache in ~/.cache/sgrep/models/. To use a model from a local directory:

sgrep --model model2vec --model-path ./my-model/ "query" src/

The model directory should contain model.safetensors, tokenizer.json, and optionally config.json.

.md
All docs →

Was this page helpful?