Zero-Shot Classification
The -e flag enables classification mode. Given comma-separated labels, sgrep classifies each chunk of text against all labels and outputs the best match with a confidence score.
How It Works
When you use -e "label1,label2,label3", sgrep:
- Embeds all labels as semantic vectors
- Chunks each input file (by line, paragraph, or whole file)
- Embeds each chunk and computes cosine similarity against every label
- Outputs the best-matching label + score for each chunk
No training required — works with any labels on any text.
Basic Usage
# Classify each line of a file
sgrep -e "bug,feature,refactor,docs" CHANGELOG.md
# Classify by paragraph
sgrep -e "authentication,database,networking" src/ --granularity paragraph
# Classify whole files
sgrep -e "frontend,backend,config" src/ --granularity file
# Custom chunk size (5 lines per chunk)
sgrep -e "error,warning,info" app.log --chunk-size 5
Hybrid Classification
Short labels like “auth” may not match “authentication” in pure semantic mode. Use --hybrid to add keyword matching:
# Pure semantic (may miss abbreviations)
sgrep -e "auth,db,config" src/
# Hybrid (catches abbreviations via keyword matching)
sgrep -e "auth,db,config" --hybrid src/
# Tune the blend (more keyword weight)
sgrep -e "auth,db,config" --hybrid --alpha 0.5 src/
Use Cases
Changelog Categorization
Automatically label changelog entries by type:
# Find breaking changes
sgrep -e "breaking change" CHANGELOG.md
# Find feature additions
sgrep -e "new feature" CHANGELOG.md
# Find bug fixes
sgrep -e "bug fix" CHANGELOG.md
Output:
{"score":0.91,"text":"- BREAKING: Remove deprecated API endpoints"}
{"score":0.74,"text":"- Add new authentication middleware"}
{"score":0.68,"text":"- Fix memory leak in worker pool"}
Code Categorization
Identify different types of code patterns:
# Find database queries
sgrep -e "SQL query" src/
# Find API endpoints
sgrep -e "HTTP route handler" src/
# Find test files
sgrep -e "unit test" src/
Sentiment Analysis
Classify log messages by severity:
# Critical errors
sgrep -e "critical failure" app.log | jq 'select(.score > 0.7)'
# Warnings
sgrep -e "warning message" app.log | jq 'select(.score > 0.6)'
Tip: Combine with
jqfor powerful filtering:sgrep -e "security issue" src/ | jq 'select(.score > 0.75)'
JSON Output Format
Each line outputs a JSON object:
{"score":0.85,"text":"the original text line"}
score: Float between 0 and 1, higher means more relevanttext: The original line that was classified
Pipeline Integration
Use -e in shell pipelines for sophisticated text processing:
# Find all authentication-related code
find src -name "*.rs" | xargs cat | sgrep -e "authentication" | jq -r '.text' | sort -u
# Classify git commits by type
git log --oneline | sgrep -e "performance improvement"
Thresholding
Set a minimum similarity threshold:
# Only show high-confidence matches (0.7+)
sgrep -e "security vulnerability" src/ | jq 'select(.score > 0.7)'
# Invert: find low-similarity lines
sgrep -e "documentation" src/ | jq 'select(.score < 0.5)'
Batch Processing
Process multiple files and aggregate results:
# Classify all Rust files
for file in src/**/*.rs; do
sgrep -e "TODO comment" "$file" | jq --arg f "$file" '. + {file: $f}'
done
Warning: The
-eflag reads input line-by-line. For multi-line context, consider using--jsonmode instead.
Comparison: -e vs --json
| Feature | -e | --json |
|---|---|---|
| Purpose | Classification | Search with metadata |
| Output | One JSON per line | Single JSON array |
| Scores | Always included | Optional |
| Context | Single line | Full match context |
Use -e when you want to classify text. Use --json when you want to search with structured output.
Was this page helpful?