Minimal, extensible AI-assisted code review tool for PHP projects.
- Analyzes unified diffs (from Pull/Merge Requests or files)
- Produces normalized findings (machine-readable JSON or human summary)
- Loads a simple YAML/JSON config with provider/policy settings and an optional coding guidelines file
- Safe defaults: deterministic Mock AI provider; no network calls unless configured
Official documentation: Docs
-
- Objectives and scope
-
- Architecture and main modules
-
- Quick start
-
- Configuration (.aicodereview.yml)
-
- VCS adapters (GitHub/GitLab)
-
- Coding guidelines file
-
- AI providers and token budgeting
-
- Security & Performance
-
- Output formats
-
- Development and QA
-
- Credits
-
- License
- Functional
- Analyze diffs and produce review findings for coding standard violations and simple risk patterns.
- Dynamic configuration for providers, policy, token budget, rules, and VCS.
- Post results back to PR/MR via platform adapters when requested.
- Non-functional
- Safe defaults: no external calls by default (mock provider) and no PR comments unless
--comment. - Modular design to plug real LLM providers and VCS platforms.
- Safe defaults: no external calls by default (mock provider) and no PR comments unless
bin/aicr: CLI entry point (Symfony Console) running the review command in single-command mode.src/Command/ReviewCommand.php: Orchestrates reading config, loading diff (from file or git), running Pipeline, and optional PR/MR commenting. Uses Symfony Process for git.src/Config.php: Loads YAML/JSON config, merges with defaults, expands${ENV}variables, exposes sections (providers, context, policy, vcs, prompts).src/DiffParser.php: Minimal unified diff parser returning added lines per file with accurate line numbers.src/Pipeline.php: End-to-end pipeline: parse diff, build AI provider, chunk with token budget, apply policy, and render output.src/Adapters/: VcsAdapter interface and GithubAdapter/GitlabAdapter/BitbucketAdapter implementations (resolve branches from PR/MR id and post comments).src/Providers/: AIProvider interface and concrete providers (OpenAI, Gemini, Anthropic, Ollama, Mock).src/Support/: Core utility classes for enhanced functionality:ChunkBuilder: Intelligent diff chunking with semantic analysis and optimizationTokenBudget: Advanced token management with compression and per-file capsResourceManager: Safe resource handling with automatic cleanupApiCache: Response caching with TTL and size managementInputSanitizer: Security-focused input validation and sanitizationDiffProcessor: Enhanced diff processing with filtering capabilitiesSemanticChunker: Context-aware code chunking for better AI analysis
src/Config/Constants: Centralized configuration constants replacing magic numbers and strings.
- Install dependencies via Composer:
composer install- Option A: Analyze an existing diff file
- Create or use a unified diff, e.g.,
examples/sample.diff. - Run:
- Create or use a unified diff, e.g.,
php bin/aicr review --diff-file examples/sample.diff --output summary
php bin/aicr review --diff-file examples/sample.diff --output json
php bin/aicr review --diff-file examples/sample.diff --output summary --provider openai- Option B: Analyze a PR/MR by ID using git
- Configure
vcs.platformin.aicodereview.yml(github or gitlab) and set required identifiers/tokens. - Then run (the command fetches branches, computes diff, and analyzes it):
- Configure
php bin/aicr review --id 123 --output summary
php bin/aicr review --id 123 --output summary --provider gemini- To also post a comment back to the PR/MR, add
--comment:
php bin/aicr review --id 123 --output summary --comment
php bin/aicr review --id 123 --output summary --comment --provider anthropicNotes
- Provide
--config <path>to use a non-default config file. - Use
--provider <name>to override the default provider from config (e.g., openai, gemini, anthropic, ollama, mock). - Without
--diff-file,--idis required and branches are resolved via the configured adapter.
Example (see .aicodereview.yml in this repo and examples/config.*.yml):
version: 1
providers:
# Safe deterministic provider by default
default: mock
context:
diff_token_limit: 8000
overflow_strategy: trim
per_file_token_cap: 2000
enable_semantic_chunking: true
enable_diff_compression: true
policy:
min_severity_to_comment: info
max_comments: 50
redact_secrets: true
consolidate_similar_findings: true
max_findings_per_file: 5
severity_limits:
error: 10
warning: 10
info: 5
guidelines_file: null
vcs:
# Set one of: github | gitlab | bitbucket
platform: null
# GitHub: owner/repo (optional if GH_REPO env or remote origin is GitHub)
repo: null
# GitLab: numeric id or full path namespace/repo (optional if GL_PROJECT_ID or remote origin is GitLab)
project_id: null
# GitLab: override API base for self-hosted instances (e.g., https://gitlab.example.com/api/v4)
api_base: null
# Bitbucket: workspace name (required for Bitbucket)
workspace: null
# Bitbucket: repository name (required for Bitbucket)
repository: null
# Bitbucket: access token for authentication (required for Bitbucket)
accessToken: null
# Bitbucket: API request timeout in seconds (optional, defaults to 30)
timeout: 30
prompts:
# Optional: append additional instructions to the base prompts used by the LLM
# You can use single strings or lists of strings
system_append: "Prefer concise findings and avoid duplicates."
user_append:
- "Prioritize security and performance related issues."
extra:
- "If a secret or key is detected, suggest redaction."
excludes:
# Array of paths to exclude from code review
# Each element is treated as glob, regex, or relative path from project root
# Examples:
- "*.md" # Exclude all markdown files (glob)
- "composer.lock" # Exclude specific files (exact match)
- "tests/*.php" # Exclude files in specific directories with patterns (glob)
- "vendor" # Exclude entire vendor directory (directory)
- "node_modules" # Exclude node_modules directory (directory)
- "build" # Exclude build artifacts (directory)
- "dist" # Exclude distribution files (directory)Notes
- Env var expansion works in any string value:
${VAR_NAME}. - Tokens/ids read from env if not set:
GH_TOKEN/GITHUB_TOKEN,GL_TOKEN/GITLAB_TOKEN,GH_REPO,GL_PROJECT_ID.
- Configure
vcs.platformand required parameters as needed. - The review command supports a single
--idoption (PR number for GitHub, MR IID for GitLab, PR ID for Bitbucket). - Behavior when
--diff-fileis omitted:- Resolve base/head branches from the ID via platform API.
git fetch --all; fetch base/head; computegit diff base...head.- Run the analysis pipeline on that diff.
--commentposts the summary back via the adapter.
- You can provide a project coding standard or style guide via
guidelines_filein.aicodereview.yml. - When set, its content is embedded into the LLM prompts as a base64 string. The prompt explicitly instructs the model to base64-decode the guidelines and follow them strictly during the review.
- No provider-specific file uploads are performed: all supported providers (OpenAI, Gemini, Anthropic, Ollama) receive the same base64-embedded guidelines in the prompt.
- Supported providers in this repository:
openai,gemini,anthropic,ollama,mock. - Select via
providers.defaultand configure each provider section accordingly (seesrc/Providers/*for options). - Token budgeting is approximate (chars/4). Global and per-file caps are configurable;
overflow_strategydefaults totrim.
The system includes sophisticated token cost optimization capabilities:
- Semantic Chunking: Enable with
enable_semantic_chunking: trueto group related code changes by context (classes, methods, etc.) - Diff Compression: Enable with
enable_diff_compression: trueto intelligently compress diffs while maintaining semantic meaning - Trivial Change Filtering: Automatically filters out whitespace-only changes, TODO comments, and import statements
- Similar Finding Consolidation: Set
consolidate_similar_findings: trueto aggregate similar issues across multiple files - Per-file Limits: Control review scope with
max_findings_per_fileto prevent overwhelming output - Severity Limits: Fine-tune output with
severity_limitsto cap the number of findings by severity level
These optimizations can reduce token usage by 30-50% for input and 40-60% for output while maintaining review quality. See docs/token-cost-optimization.md for detailed implementation guide.
Introduces significant enhancements focusing on security hardening, performance optimization, and code quality improvements:
- InputSanitizer: Comprehensive input validation and sanitization for all external data
- Branch name, repository name, and file path validation
- API response sanitization to prevent injection attacks
- URL and commit SHA validation with strict patterns
- Resource Management: Safe resource handling with automatic cleanup
- Temporary file and directory management
- Resource leak prevention with shutdown handlers
- Exception-safe cleanup with try-finally patterns
- Intelligent Chunking: Enhanced ChunkBuilder with semantic analysis
- Batch processing for better memory management
- Parallel-friendly architecture for large diffs
- Context-aware chunking for improved AI analysis
- Advanced Token Management: Improved TokenBudget with compression
- Per-file token caps to prevent oversized chunks
- Diff compression for large files
- Smart budget allocation and overflow handling
- API Response Caching: New ApiCache system for improved performance
- TTL-based caching with automatic expiration
- Size-limited cache with LRU eviction
- Request deduplication and response reuse
- Constants Centralization: All magic numbers and strings moved to Constants class
- Enhanced Error Handling: Standardized exception handling across all providers
- Improved Documentation: Comprehensive PHPDoc comments and inline documentation
- Security Audit: Fixed potential security issues identified in code review
New configuration options available:
context:
enable_semantic_chunking: true # Enable context-aware chunking
enable_diff_compression: true # Enable diff compression for large files
cache_ttl: 3600 # API response cache TTL in seconds
max_cache_size: 52428800 # Maximum cache size in bytes (50MB)json(default): machine-readable findings array.summary: human-readable bulleted list. This is also the format used for PR/MR comments.markdown: structured markdown format with emojis, metadata, and organized findings by severity and file.
- Requires PHP and Composer.
- Run unit and E2E tests with PHPUnit:
./vendor/bin/phpunit- Coding standards and static analysis:
composer analyse- The codebase uses
declare(strict_types=1)and Symfony components (Console, YAML, Filesystem, Process).
- Author: Raffaele Carelle
- Contributors: Thanks to everyone who reports issues or submits PRs.
This project is open-sourced under the MIT License. See the LICENSE file for details.
