CI Pipelines
sgrep fits naturally into continuous integration pipelines. Use semantic search to enforce code quality standards, detect patterns, and gate deployments.
GitHub Actions
Basic Quality Gate
Fail the pipeline if sensitive patterns are found:
name: Security Scan
on: [push, pull_request]
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install sgrep
run: curl -sSf https://sgrep.sh/install.sh | sh
- name: Scan for hardcoded secrets
run: |
if sgrep --json "hardcoded API key or secret" src/ | jq '. | length > 0'; then
echo "Found potential hardcoded secrets"
exit 1
fi
PR Comment on Matches
Leave a comment on pull requests when issues are found:
name: Code Review
on: pull_request
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install sgrep
run: curl -sSf https://sgrep.sh/install.sh | sh
- name: Check for TODO comments
id: scan
run: |
result=$(sgrep --json "TODO or FIXME" src/)
echo "result<<EOF">$GITHUB_OUTPUT
echo "$result">$GITHUB_OUTPUT
echo EOF>>$GITHUB_OUTPUT
- uses: actions/github-script@v7
if: steps.scan.outputs.result != '[]'
with:
script: |
const result = JSON.parse('${{ steps.scan.outputs.result }}');
const body = result.map(r => `- **${r.path}:${r.line}**\n ${r.text}`).join('\n');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Found ${result.length} TODOs:\n${body}`
})
Caching the Model
Speed up builds by caching the downloaded model:
- name: Cache sgrep model
uses: actions/cache@v4
with:
path: ~/.cache/sgrep
key: sgrep-model-${{ hashFiles('**/Cargo.lock') }}
GitLab CI
Security Scanning
stages:
- test
security:
stage: test
image: rust:latest
before_script:
- cargo install sgrep --git https://github.com/henrikalbihn/sgrep
script:
# Scan for sensitive patterns
- |
matches=$(sgrep --json "password or credential or secret" src/)
if [ "$matches" != "[]" ]; then
echo "Security issues found:"
echo "$matches" | jq -r '.[] | "\(.path):\(.line): \(.text)"'
exit 1
fi
allow_failure: false
Quality Metrics
Track code quality over time:
quality:
stage: test
script:
- cargo install sgrep
# Count complex functions (heuristic)
- complex=$(sgrep --json "complex function or nested logic" src/ | jq 'length')
- echo "COMPLEX_FUNCTIONS=$complex" >> metrics.txt
- echo "DETECTED_PATTERNS=$complex" > metrics.env
artifacts:
reports:
metrics: metrics.txt
Jenkins Pipeline
pipeline {
agent any
stages {
stage('Install sgrep') {
steps {
sh 'curl -sSf https://sgrep.sh/install.sh | sh'
}
}
stage('Code Scan') {
steps {
script {
def matches = sh(
script: 'sgrep --json "deprecated API usage" src/',
returnStdout: true
)
def count = readJSON(text: matches).size()
if (count > 0) {
unstable("Found ${count} deprecated API uses")
}
}
}
}
}
}
Parsing JSON Output
The --json flag outputs structured data that’s easy to parse:
# Count total matches
count=$(sgrep --json "pattern" src/ | jq 'length')
# Get only file paths
sgrep --json "pattern" src/ | jq -r '.[].path' | sort -u
# Extract specific fields
sgrep --json "pattern" src/ | jq -r '.[] | "\(.path):\(.line)"'
# Filter by score threshold
sgrep --json "pattern" src/ | jq 'map(select(.score > 0.8))'
# Group by file
sgrep --json "pattern" src/ | jq 'group_by(.path) | map({path: .[0].path, count: length})'
Common Patterns
Enforce Documentation Coverage
# Fail if public functions lack docs
public_funcs=$(grep -r "^pub fn" src/ | wc -l)
doc_comments=$(sgrep --json "public function documentation" src/ | jq 'length')
if [ "$doc_comments" -lt "$public_funcs" ]; then
echo "Some public functions lack documentation"
exit 1
fi
Detect Deprecated Patterns
# Find deprecated API usage
sgrep --json "deprecated or obsolete" src/ | jq -r '.[] | "\(.path):\(.line): \(.text)"'
Check for Error Handling
# Find functions without error handling
if sgrep "function without error handling" src/ | grep -q "unsafe"; then
echo "Found potential error handling issues"
exit 1
fi
Note: sgrep works offline and makes no network requests. The model is downloaded once and cached locally, making it ideal for air-gapped environments.
Performance Tips
- Use
--jsonoutput for efficient parsing - Cache the model directory for faster builds
- Run sgrep on specific directories, not the entire repo
- Combine with
jqfor powerful filtering and aggregation - Consider parallelizing scans for large codebases
# Parallel scan across multiple directories
find src -type d | parallel -j4 'sgrep --json "pattern" {}' Was this page helpful?