Zero-Shot Classification

The -e flag enables classification mode. Given comma-separated labels, sgrep classifies each chunk of text against all labels and outputs the best match with a confidence score.

How It Works

When you use -e "label1,label2,label3", sgrep:

Embeds all labels as semantic vectors
Chunks each input file (by line, paragraph, or whole file)
Embeds each chunk and computes cosine similarity against every label
Outputs the best-matching label + score for each chunk

No training required — works with any labels on any text.

Basic Usage

# Classify each line of a file
sgrep -e "bug,feature,refactor,docs" CHANGELOG.md

# Classify by paragraph
sgrep -e "authentication,database,networking" src/ --granularity paragraph

# Classify whole files
sgrep -e "frontend,backend,config" src/ --granularity file

# Custom chunk size (5 lines per chunk)
sgrep -e "error,warning,info" app.log --chunk-size 5

Hybrid Classification

Short labels like “auth” may not match “authentication” in pure semantic mode. Use --hybrid to add keyword matching:

# Pure semantic (may miss abbreviations)
sgrep -e "auth,db,config" src/

# Hybrid (catches abbreviations via keyword matching)
sgrep -e "auth,db,config" --hybrid src/

# Tune the blend (more keyword weight)
sgrep -e "auth,db,config" --hybrid --alpha 0.5 src/

Use Cases

Changelog Categorization

Automatically label changelog entries by type:

# Find breaking changes
sgrep -e "breaking change" CHANGELOG.md

# Find feature additions
sgrep -e "new feature" CHANGELOG.md

# Find bug fixes
sgrep -e "bug fix" CHANGELOG.md

Output:

{"score":0.91,"text":"- BREAKING: Remove deprecated API endpoints"}
{"score":0.74,"text":"- Add new authentication middleware"}
{"score":0.68,"text":"- Fix memory leak in worker pool"}

Code Categorization

Identify different types of code patterns:

# Find database queries
sgrep -e "SQL query" src/

# Find API endpoints
sgrep -e "HTTP route handler" src/

# Find test files
sgrep -e "unit test" src/

Sentiment Analysis

Classify log messages by severity:

# Critical errors
sgrep -e "critical failure" app.log | jq 'select(.score > 0.7)'

# Warnings
sgrep -e "warning message" app.log | jq 'select(.score > 0.6)'

Tip: Combine with jq for powerful filtering:
sgrep -e "security issue" src/ | jq 'select(.score > 0.75)'

JSON Output Format

Each line outputs a JSON object:

{"score":0.85,"text":"the original text line"}

score: Float between 0 and 1, higher means more relevant
text: The original line that was classified

Pipeline Integration

Use -e in shell pipelines for sophisticated text processing:

# Find all authentication-related code
find src -name "*.rs" | xargs cat | sgrep -e "authentication" | jq -r '.text' | sort -u

# Classify git commits by type
git log --oneline | sgrep -e "performance improvement"

Thresholding

Set a minimum similarity threshold:

# Only show high-confidence matches (0.7+)
sgrep -e "security vulnerability" src/ | jq 'select(.score > 0.7)'

# Invert: find low-similarity lines
sgrep -e "documentation" src/ | jq 'select(.score < 0.5)'

Batch Processing

Process multiple files and aggregate results:

# Classify all Rust files
for file in src/**/*.rs; do
  sgrep -e "TODO comment" "$file" | jq --arg f "$file" '. + {file: $f}'
done

Warning: The -e flag reads input line-by-line. For multi-line context, consider using --json mode instead.

Comparison: `-e` vs `--json`

Feature	`-e`	`--json`
Purpose	Classification	Search with metadata
Output	One JSON per line	Single JSON array
Scores	Always included	Optional
Context	Single line	Full match context

Use -e when you want to classify text. Use --json when you want to search with structured output.