CI Pipelines

sgrep fits naturally into continuous integration pipelines. Use semantic search to enforce code quality standards, detect patterns, and gate deployments.

GitHub Actions

Basic Quality Gate

Fail the pipeline if sensitive patterns are found:

name: Security Scan
on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install sgrep
        run: curl -sSf https://sgrep.sh/install.sh | sh
      - name: Scan for hardcoded secrets
        run: |
          if sgrep --json "hardcoded API key or secret" src/ | jq '. | length > 0'; then
            echo "Found potential hardcoded secrets"
            exit 1
          fi

PR Comment on Matches

Leave a comment on pull requests when issues are found:

name: Code Review
on: pull_request

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install sgrep
        run: curl -sSf https://sgrep.sh/install.sh | sh
      - name: Check for TODO comments
        id: scan
        run: |
          result=$(sgrep --json "TODO or FIXME" src/)
          echo "result<<EOF">$GITHUB_OUTPUT
          echo "$result">$GITHUB_OUTPUT
          echo EOF>>$GITHUB_OUTPUT
      - uses: actions/github-script@v7
        if: steps.scan.outputs.result != '[]'
        with:
          script: |
            const result = JSON.parse('${{ steps.scan.outputs.result }}');
            const body = result.map(r => `- **${r.path}:${r.line}**\n  ${r.text}`).join('\n');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Found ${result.length} TODOs:\n${body}`
            })

Caching the Model

Speed up builds by caching the downloaded model:

- name: Cache sgrep model
  uses: actions/cache@v4
  with:
    path: ~/.cache/sgrep
    key: sgrep-model-${{ hashFiles('**/Cargo.lock') }}

GitLab CI

Security Scanning

stages:
  - test

security:
  stage: test
  image: rust:latest
  before_script:
    - cargo install sgrep --git https://github.com/henrikalbihn/sgrep
  script:
    # Scan for sensitive patterns
    - |
      matches=$(sgrep --json "password or credential or secret" src/)
      if [ "$matches" != "[]" ]; then
        echo "Security issues found:"
        echo "$matches" | jq -r '.[] | "\(.path):\(.line): \(.text)"'
        exit 1
      fi
  allow_failure: false

Quality Metrics

Track code quality over time:

quality:
  stage: test
  script:
    - cargo install sgrep
    # Count complex functions (heuristic)
    - complex=$(sgrep --json "complex function or nested logic" src/ | jq 'length')
    - echo "COMPLEX_FUNCTIONS=$complex" >> metrics.txt
    - echo "DETECTED_PATTERNS=$complex" > metrics.env
  artifacts:
    reports:
      metrics: metrics.txt

Jenkins Pipeline

pipeline {
    agent any
    stages {
        stage('Install sgrep') {
            steps {
                sh 'curl -sSf https://sgrep.sh/install.sh | sh'
            }
        }
        stage('Code Scan') {
            steps {
                script {
                    def matches = sh(
                        script: 'sgrep --json "deprecated API usage" src/',
                        returnStdout: true
                    )
                    def count = readJSON(text: matches).size()
                    if (count > 0) {
                        unstable("Found ${count} deprecated API uses")
                    }
                }
            }
        }
    }
}

Parsing JSON Output

The --json flag outputs structured data that’s easy to parse:

# Count total matches
count=$(sgrep --json "pattern" src/ | jq 'length')

# Get only file paths
sgrep --json "pattern" src/ | jq -r '.[].path' | sort -u

# Extract specific fields
sgrep --json "pattern" src/ | jq -r '.[] | "\(.path):\(.line)"'

# Filter by score threshold
sgrep --json "pattern" src/ | jq 'map(select(.score > 0.8))'

# Group by file
sgrep --json "pattern" src/ | jq 'group_by(.path) | map({path: .[0].path, count: length})'

Common Patterns

Enforce Documentation Coverage

# Fail if public functions lack docs
public_funcs=$(grep -r "^pub fn" src/ | wc -l)
doc_comments=$(sgrep --json "public function documentation" src/ | jq 'length')

if [ "$doc_comments" -lt "$public_funcs" ]; then
  echo "Some public functions lack documentation"
  exit 1
fi

Detect Deprecated Patterns

# Find deprecated API usage
sgrep --json "deprecated or obsolete" src/ | jq -r '.[] | "\(.path):\(.line): \(.text)"'

Check for Error Handling

# Find functions without error handling
if sgrep "function without error handling" src/ | grep -q "unsafe"; then
  echo "Found potential error handling issues"
  exit 1
fi

Note: sgrep works offline and makes no network requests. The model is downloaded once and cached locally, making it ideal for air-gapped environments.

Performance Tips

Use --json output for efficient parsing
Cache the model directory for faster builds
Run sgrep on specific directories, not the entire repo
Combine with jq for powerful filtering and aggregation
Consider parallelizing scans for large codebases

# Parallel scan across multiple directories
find src -type d | parallel -j4 'sgrep --json "pattern" {}'