# Models

sgrep ships with Model2Vec by default — an 8M parameter static embedding model with 256 dimensions. It runs on CPU, requires no GPU, and downloads as a 7.5MB int8-quantized binary.

## Available Models

| Alias | Model | Dims | Size | Notes |
|-------|-------|------|------|-------|
| `model2vec` (default) | [minishlab/potion-base-8M](https://huggingface.co/minishlab/potion-base-8M) | 256 | 7.5MB (int8) | Best balance of speed and quality |
| `code-search` | [henrikalbihn/sgrep-code-v1](https://huggingface.co/henrikalbihn/sgrep-code-v1) | 256 | ~8MB (int8) | Fine-tuned for code search |
| `bge-small` | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 33MB | Higher quality, slower |
| `clip` | [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) | 512 | 350MB | Text + image search |
| `simple` | — | — | 0MB | No download, text-only heuristic (for testing) |

## Switching Models

```bash
sgrep --model bge-small "query" src/
sgrep --model code-search "query" src/
```

## How It Works

Model2Vec is a **static embedding** model — no attention layers, no transformer inference. Each token is looked up in an embedding table, then mean-pooled and normalized. This makes it:

- **Fast**: sub-10ms for typical queries
- **Small**: 7.5MB int8 vs 33MB+ for transformer models
- **Offline**: no API calls, no network after first download

## Int8 Quantization

All models are automatically quantized to int8 on download. This reduces size by ~4x with minimal quality loss:

- f32 embeddings: ~30MB → int8: ~7.5MB
- Quality impact: less than 1% on retrieval benchmarks
- The quantization scale factor is stored in `config.json` alongside the model

HuggingFace models that ship as f32 are quantized at download time — you don't need to do anything.

## Model Cache

Models auto-download on first use and cache in `~/.cache/sgrep/models/`. To use a model from a local directory:

```bash
sgrep --model model2vec --model-path ./my-model/ "query" src/
```

The model directory should contain `model.safetensors`, `tokenizer.json`, and optionally `config.json`.