# Zero-Shot Classification

The `-e` flag enables **classification mode**. Given comma-separated labels, sgrep classifies each chunk of text against all labels and outputs the best match with a confidence score.

## How It Works

When you use `-e "label1,label2,label3"`, sgrep:

1. Embeds all labels as semantic vectors
2. Chunks each input file (by line, paragraph, or whole file)
3. Embeds each chunk and computes cosine similarity against every label
4. Outputs the best-matching label + score for each chunk

No training required — works with any labels on any text.

## Basic Usage

```bash
# Classify each line of a file
sgrep -e "bug,feature,refactor,docs" CHANGELOG.md

# Classify by paragraph
sgrep -e "authentication,database,networking" src/ --granularity paragraph

# Classify whole files
sgrep -e "frontend,backend,config" src/ --granularity file

# Custom chunk size (5 lines per chunk)
sgrep -e "error,warning,info" app.log --chunk-size 5
```

## Hybrid Classification

Short labels like "auth" may not match "authentication" in pure semantic mode. Use `--hybrid` to add keyword matching:

```bash
# Pure semantic (may miss abbreviations)
sgrep -e "auth,db,config" src/

# Hybrid (catches abbreviations via keyword matching)
sgrep -e "auth,db,config" --hybrid src/

# Tune the blend (more keyword weight)
sgrep -e "auth,db,config" --hybrid --alpha 0.5 src/
```

## Use Cases

### Changelog Categorization

Automatically label changelog entries by type:

```bash
# Find breaking changes
sgrep -e "breaking change" CHANGELOG.md

# Find feature additions
sgrep -e "new feature" CHANGELOG.md

# Find bug fixes
sgrep -e "bug fix" CHANGELOG.md
```

Output:
```json
{"score":0.91,"text":"- BREAKING: Remove deprecated API endpoints"}
{"score":0.74,"text":"- Add new authentication middleware"}
{"score":0.68,"text":"- Fix memory leak in worker pool"}
```

### Code Categorization

Identify different types of code patterns:

```bash
# Find database queries
sgrep -e "SQL query" src/

# Find API endpoints
sgrep -e "HTTP route handler" src/

# Find test files
sgrep -e "unit test" src/
```

### Sentiment Analysis

Classify log messages by severity:

```bash
# Critical errors
sgrep -e "critical failure" app.log | jq 'select(.score > 0.7)'

# Warnings
sgrep -e "warning message" app.log | jq 'select(.score > 0.6)'
```

> **Tip**: Combine with `jq` for powerful filtering:
> ```bash
> sgrep -e "security issue" src/ | jq 'select(.score > 0.75)'
> ```

## JSON Output Format

Each line outputs a JSON object:

```json
{"score":0.85,"text":"the original text line"}
```

- `score`: Float between 0 and 1, higher means more relevant
- `text`: The original line that was classified

## Pipeline Integration

Use `-e` in shell pipelines for sophisticated text processing:

```bash
# Find all authentication-related code
find src -name "*.rs" | xargs cat | sgrep -e "authentication" | jq -r '.text' | sort -u

# Classify git commits by type
git log --oneline | sgrep -e "performance improvement"
```

## Thresholding

Set a minimum similarity threshold:

```bash
# Only show high-confidence matches (0.7+)
sgrep -e "security vulnerability" src/ | jq 'select(.score > 0.7)'

# Invert: find low-similarity lines
sgrep -e "documentation" src/ | jq 'select(.score < 0.5)'
```

## Batch Processing

Process multiple files and aggregate results:

```bash
# Classify all Rust files
for file in src/**/*.rs; do
  sgrep -e "TODO comment" "$file" | jq --arg f "$file" '. + {file: $f}'
done
```

> **Warning**: The `-e` flag reads input line-by-line. For multi-line context, consider using `--json` mode instead.

## Comparison: `-e` vs `--json`

| Feature | `-e` | `--json` |
|---------|------|----------|
| Purpose | Classification | Search with metadata |
| Output | One JSON per line | Single JSON array |
| Scores | Always included | Optional |
| Context | Single line | Full match context |

Use `-e` when you want to **classify** text. Use `--json` when you want to **search** with structured output.