Zero-Shot Classification

The -e flag enables classification mode. Given comma-separated labels, sgrep classifies each chunk of text against all labels and outputs the best match with a confidence score.

How It Works

When you use -e "label1,label2,label3", sgrep:

  1. Embeds all labels as semantic vectors
  2. Chunks each input file (by line, paragraph, or whole file)
  3. Embeds each chunk and computes cosine similarity against every label
  4. Outputs the best-matching label + score for each chunk

No training required — works with any labels on any text.

Basic Usage

# Classify each line of a file
sgrep -e "bug,feature,refactor,docs" CHANGELOG.md

# Classify by paragraph
sgrep -e "authentication,database,networking" src/ --granularity paragraph

# Classify whole files
sgrep -e "frontend,backend,config" src/ --granularity file

# Custom chunk size (5 lines per chunk)
sgrep -e "error,warning,info" app.log --chunk-size 5

Hybrid Classification

Short labels like “auth” may not match “authentication” in pure semantic mode. Use --hybrid to add keyword matching:

# Pure semantic (may miss abbreviations)
sgrep -e "auth,db,config" src/

# Hybrid (catches abbreviations via keyword matching)
sgrep -e "auth,db,config" --hybrid src/

# Tune the blend (more keyword weight)
sgrep -e "auth,db,config" --hybrid --alpha 0.5 src/

Use Cases

Changelog Categorization

Automatically label changelog entries by type:

# Find breaking changes
sgrep -e "breaking change" CHANGELOG.md

# Find feature additions
sgrep -e "new feature" CHANGELOG.md

# Find bug fixes
sgrep -e "bug fix" CHANGELOG.md

Output:

{"score":0.91,"text":"- BREAKING: Remove deprecated API endpoints"}
{"score":0.74,"text":"- Add new authentication middleware"}
{"score":0.68,"text":"- Fix memory leak in worker pool"}

Code Categorization

Identify different types of code patterns:

# Find database queries
sgrep -e "SQL query" src/

# Find API endpoints
sgrep -e "HTTP route handler" src/

# Find test files
sgrep -e "unit test" src/

Sentiment Analysis

Classify log messages by severity:

# Critical errors
sgrep -e "critical failure" app.log | jq 'select(.score > 0.7)'

# Warnings
sgrep -e "warning message" app.log | jq 'select(.score > 0.6)'

Tip: Combine with jq for powerful filtering:

sgrep -e "security issue" src/ | jq 'select(.score > 0.75)'

JSON Output Format

Each line outputs a JSON object:

{"score":0.85,"text":"the original text line"}
  • score: Float between 0 and 1, higher means more relevant
  • text: The original line that was classified

Pipeline Integration

Use -e in shell pipelines for sophisticated text processing:

# Find all authentication-related code
find src -name "*.rs" | xargs cat | sgrep -e "authentication" | jq -r '.text' | sort -u

# Classify git commits by type
git log --oneline | sgrep -e "performance improvement"

Thresholding

Set a minimum similarity threshold:

# Only show high-confidence matches (0.7+)
sgrep -e "security vulnerability" src/ | jq 'select(.score > 0.7)'

# Invert: find low-similarity lines
sgrep -e "documentation" src/ | jq 'select(.score < 0.5)'

Batch Processing

Process multiple files and aggregate results:

# Classify all Rust files
for file in src/**/*.rs; do
  sgrep -e "TODO comment" "$file" | jq --arg f "$file" '. + {file: $f}'
done

Warning: The -e flag reads input line-by-line. For multi-line context, consider using --json mode instead.

Comparison: -e vs --json

Feature-e--json
PurposeClassificationSearch with metadata
OutputOne JSON per lineSingle JSON array
ScoresAlways includedOptional
ContextSingle lineFull match context

Use -e when you want to classify text. Use --json when you want to search with structured output.

.md
All docs →

Was this page helpful?