SCAN

SCAN Plugin Pattern Reference

This document provides a comprehensive reference for all built-in patterns used by the SCAN plugin and guidance on creating custom patterns.

Pattern Overview

SCAN uses regular expressions (regex) to identify potential secrets in your codebase. The plugin includes over 50 built-in patterns covering common secret types, and you can add custom patterns for organization-specific secrets.

How Patterns Work

  1. Pattern Matching: SCAN scans each file line by line, applying regex patterns
  2. Context Analysis: The surrounding code context is analyzed to reduce false positives
  3. Entropy Analysis: High-entropy strings are flagged even if they don't match specific patterns
  4. Confidence Scoring: Each finding is assigned a confidence level based on multiple factors

Built-in Patterns

AWS Patterns

AWS Access Key ID

Pattern: AKIA[0-9A-Z]{16}

Example: AKIAIOSFODNN7EXAMPLE

Confidence: High

AWS Secret Access Key

Pattern: aws(.{0,20})?['"][0-9a-zA-Z/+]{40}['"]

Example: aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Confidence: High

Google Cloud Platform (GCP)

GCP API Key

Pattern: AIza[0-9A-Za-z-_]{35}

Example: AIzaSyDaGmWKa4JsXZ-HjGw7ISLn_3namBGewQe

Confidence: High

Azure

Azure Storage Account Key

Pattern: DefaultEndpointsProtocol=https;AccountName=.*;AccountKey=.*

Description: Azure storage connection strings

Confidence: High

Cloud Services

GitHub

GitHub Personal Access Token

Pattern: ghp_[A-Za-z0-9]{36}

Example: ghp_1234567890abcdef1234567890abcdef12345678

Confidence: High

GitHub OAuth Token

Pattern: gho_[A-Za-z0-9]{36}

Description: GitHub OAuth access tokens

Confidence: High

GitLab

GitLab Personal Access Token

Pattern: glpat-[A-Za-z0-9-_]{20}

Description: GitLab personal access tokens

Confidence: High

Database Credentials

Generic Database URLs

Database Connection String

Pattern: (mysql|postgresql|mongodb|redis)://[^s]+:[^s]+@[^s]+

Example: mysql://user:password@localhost:3306/database

Confidence: High

API Keys and Tokens

Service-Specific APIs

Slack Token

Pattern: xox[baprs]-[A-Za-z0-9]{10,48}

Example: xoxb-1234567890-1234567890-abcdefghijklmnopqrstuvwx

Confidence: High

Stripe API Key

Pattern: sk_live_[A-Za-z0-9]{24}

Description: Stripe live API keys

Confidence: High

Cryptographic Keys

Private Key

Pattern: -----BEGIN [A-Z]+ PRIVATE KEY-----

Description: PEM-formatted private keys

Confidence: High

Custom Pattern Creation

Basic Custom Patterns

Add custom patterns to catch organization-specific secrets:

scan {
    customPatterns = listOf(
        "MYCOMPANY_API_[A-Z0-9]{32}",
        "INTERNAL_SECRET_[a-f0-9]{64}",
        "PROD_KEY_[A-Za-z0-9\\-_]{40}"
    )
}

Advanced Custom Patterns

scan {
    customPatterns = listOf(
        // Match passwords in specific file types
        "(?i)password\\s*[=:]\\s*[\"'][^\"']{8,}[\"']",
        
        // Match API keys with specific prefixes
        "(?:api[_-]?key|apikey)\\s*[=:]\\s*[\"']?([A-Za-z0-9]{20,})[\"']?",
        
        // Organization-specific format
        "ACME_[A-Z]{2}_[0-9]{8}_[A-Za-z0-9]{16}"
    )
}

Pattern Files

You can organize custom patterns in separate YAML files for better maintainability:

patterns/custom-patterns.yml

patterns:
  - name: "Company API Key"
    regex: "COMPANY_API_[A-Z0-9]{32}"
    confidence: "high"
    description: "Internal company API keys"
    
  - name: "Internal Secret"
    regex: "INTERNAL_SECRET_[A-Za-z0-9]{16,}"
    confidence: "medium"
    description: "Internal application secrets"

Loading Pattern Files

scan {
    customPatternsFile = "patterns/custom-patterns.yml"
}

Best Practices

Pattern Design

  1. Be Specific: Avoid overly broad patterns that cause false positives
  2. Use Anchors: Use ^ and $ when matching entire lines
  3. Consider Context: Think about where the pattern might appear
  4. Test Thoroughly: Validate against real codebases

Common Pitfalls

❌ Overly Broad Patterns

// Too broad - will match everything
".*password.*"

// Better - more specific
"password\\s*=\\s*[\"'][^\"']{8,}[\"']"

✅ Case Insensitive Patterns

// Case sensitive - might miss variations
"Password"

// Case insensitive
"(?i)password"

Related Documentation