AI Code Review Bot

Minimal, extensible AI-assisted code review tool for PHP projects.

Analyzes unified diffs (from Pull/Merge Requests or files)
Produces normalized findings (machine-readable JSON or human summary)
Loads a simple YAML/JSON config with provider/policy settings and an optional coding guidelines file
Safe defaults: deterministic Mock AI provider; no network calls unless configured

Official documentation: Docs

1. Objectives and scope

Functional
- Analyze diffs and produce review findings for coding standard violations and simple risk patterns.
- Dynamic configuration for providers, policy, token budget, rules, and VCS.
- Post results back to PR/MR via platform adapters when requested.
Non-functional
- Safe defaults: no external calls by default (mock provider) and no PR comments unless --comment.
- Modular design to plug real LLM providers and VCS platforms.

2. Architecture and main modules (PHP)

bin/aicr: CLI entry point (Symfony Console) running the review command in single-command mode.
src/Command/ReviewCommand.php: Orchestrates reading config, loading diff (from file or git), running Pipeline, and optional PR/MR commenting. Uses Symfony Process for git.
src/Config.php: Loads YAML/JSON config, merges with defaults, expands ${ENV} variables, exposes sections (providers, context, policy, vcs, prompts).
src/DiffParser.php: Minimal unified diff parser returning added lines per file with accurate line numbers.
src/Pipeline.php: End-to-end pipeline: parse diff, build AI provider, chunk with token budget, apply policy, and render output.
src/Adapters/: VcsAdapter interface and GithubAdapter/GitlabAdapter/BitbucketAdapter implementations (resolve branches from PR/MR id and post comments).
src/Providers/: AIProvider interface and concrete providers (OpenAI, Gemini, Anthropic, Ollama, Mock).
src/Support/: Core utility classes for enhanced functionality:
- ChunkBuilder: Intelligent diff chunking with semantic analysis and optimization
- TokenBudget: Advanced token management with compression and per-file caps
- ResourceManager: Safe resource handling with automatic cleanup
- ApiCache: Response caching with TTL and size management
- InputSanitizer: Security-focused input validation and sanitization
- DiffProcessor: Enhanced diff processing with filtering capabilities
- SemanticChunker: Context-aware code chunking for better AI analysis
src/Config/Constants: Centralized configuration constants replacing magic numbers and strings.

3. Quick start

Install dependencies via Composer:

composer install

Option A: Analyze an existing diff file
- Create or use a unified diff, e.g., examples/sample.diff.
- Run:

php bin/aicr review --diff-file examples/sample.diff --output summary
php bin/aicr review --diff-file examples/sample.diff --output json
php bin/aicr review --diff-file examples/sample.diff --output summary --provider openai

Option B: Analyze a PR/MR by ID using git
- Configure vcs.platform in .aicodereview.yml (github or gitlab) and set required identifiers/tokens.
- Then run (the command fetches branches, computes diff, and analyzes it):

php bin/aicr review --id 123 --output summary
php bin/aicr review --id 123 --output summary --provider gemini

To also post a comment back to the PR/MR, add --comment:

php bin/aicr review --id 123 --output summary --comment
php bin/aicr review --id 123 --output summary --comment --provider anthropic

Notes

Provide --config <path> to use a non-default config file.
Use --provider <name> to override the default provider from config (e.g., openai, gemini, anthropic, ollama, mock).
Without --diff-file, --id is required and branches are resolved via the configured adapter.

4. Configuration (.aicodereview.yml)

Example (see .aicodereview.yml in this repo and examples/config.*.yml):

version: 1
providers:
  # Safe deterministic provider by default
  default: mock
context:
  diff_token_limit: 8000
  overflow_strategy: trim
  per_file_token_cap: 2000
  enable_semantic_chunking: true
  enable_diff_compression: true
policy:
  min_severity_to_comment: info
  max_comments: 50
  redact_secrets: true
  consolidate_similar_findings: true
  max_findings_per_file: 5
  severity_limits:
    error: 10
    warning: 10
    info: 5
guidelines_file: null
vcs:
  # Set one of: github | gitlab | bitbucket
  platform: null
  # GitHub: owner/repo (optional if GH_REPO env or remote origin is GitHub)
  repo: null
  # GitLab: numeric id or full path namespace/repo (optional if GL_PROJECT_ID or remote origin is GitLab)
  project_id: null
  # GitLab: override API base for self-hosted instances (e.g., https://gitlab.example.com/api/v4)
  api_base: null
  # Bitbucket: workspace name (required for Bitbucket)
  workspace: null
  # Bitbucket: repository name (required for Bitbucket)
  repository: null
  # Bitbucket: access token for authentication (required for Bitbucket)
  accessToken: null
  # Bitbucket: API request timeout in seconds (optional, defaults to 30)
  timeout: 30
prompts:
  # Optional: append additional instructions to the base prompts used by the LLM
  # You can use single strings or lists of strings
  system_append: "Prefer concise findings and avoid duplicates."
  user_append:
    - "Prioritize security and performance related issues."
  extra:
    - "If a secret or key is detected, suggest redaction."
excludes:
  # Array of paths to exclude from code review
  # Each element is treated as glob, regex, or relative path from project root
  # Examples:
  - "*.md"           # Exclude all markdown files (glob)
  - "composer.lock"  # Exclude specific files (exact match)
  - "tests/*.php"    # Exclude files in specific directories with patterns (glob)
  - "vendor"         # Exclude entire vendor directory (directory)
  - "node_modules"   # Exclude node_modules directory (directory)
  - "build"          # Exclude build artifacts (directory)
  - "dist"           # Exclude distribution files (directory)

Notes

Env var expansion works in any string value: ${VAR_NAME}.
Tokens/ids read from env if not set: GH_TOKEN/GITHUB_TOKEN, GL_TOKEN/GITLAB_TOKEN, GH_REPO, GL_PROJECT_ID.

5. VCS adapters (GitHub/GitLab/Bitbucket)

Configure vcs.platform and required parameters as needed.
The review command supports a single --id option (PR number for GitHub, MR IID for GitLab, PR ID for Bitbucket).
Behavior when --diff-file is omitted:
1. Resolve base/head branches from the ID via platform API.
2. git fetch --all; fetch base/head; compute git diff base...head.
3. Run the analysis pipeline on that diff.
--comment posts the summary back via the adapter.

6. Coding guidelines file

You can provide a project coding standard or style guide via guidelines_file in .aicodereview.yml.
When set, its content is embedded into the LLM prompts as a base64 string. The prompt explicitly instructs the model to base64-decode the guidelines and follow them strictly during the review.
No provider-specific file uploads are performed: all supported providers (OpenAI, Gemini, Anthropic, Ollama) receive the same base64-embedded guidelines in the prompt.

7. AI providers and token budgeting

Supported providers in this repository: openai, gemini, anthropic, ollama, mock.
Select via providers.default and configure each provider section accordingly (see src/Providers/* for options).
Token budgeting is approximate (chars/4). Global and per-file caps are configurable; overflow_strategy defaults to trim.

7.1 Advanced Token Optimization Features

The system includes sophisticated token cost optimization capabilities:

Semantic Chunking: Enable with enable_semantic_chunking: true to group related code changes by context (classes, methods, etc.)
Diff Compression: Enable with enable_diff_compression: true to intelligently compress diffs while maintaining semantic meaning
Trivial Change Filtering: Automatically filters out whitespace-only changes, TODO comments, and import statements
Similar Finding Consolidation: Set consolidate_similar_findings: true to aggregate similar issues across multiple files
Per-file Limits: Control review scope with max_findings_per_file to prevent overwhelming output
Severity Limits: Fine-tune output with severity_limits to cap the number of findings by severity level

These optimizations can reduce token usage by 30-50% for input and 40-60% for output while maintaining review quality. See docs/token-cost-optimization.md for detailed implementation guide.

8. Security & Performance

Introduces significant enhancements focusing on security hardening, performance optimization, and code quality improvements:

Security Enhancements

InputSanitizer: Comprehensive input validation and sanitization for all external data
- Branch name, repository name, and file path validation
- API response sanitization to prevent injection attacks
- URL and commit SHA validation with strict patterns
Resource Management: Safe resource handling with automatic cleanup
- Temporary file and directory management
- Resource leak prevention with shutdown handlers
- Exception-safe cleanup with try-finally patterns

Performance Optimizations

Intelligent Chunking: Enhanced ChunkBuilder with semantic analysis
- Batch processing for better memory management
- Parallel-friendly architecture for large diffs
- Context-aware chunking for improved AI analysis
Advanced Token Management: Improved TokenBudget with compression
- Per-file token caps to prevent oversized chunks
- Diff compression for large files
- Smart budget allocation and overflow handling
API Response Caching: New ApiCache system for improved performance
- TTL-based caching with automatic expiration
- Size-limited cache with LRU eviction
- Request deduplication and response reuse

Code Quality Improvements

Constants Centralization: All magic numbers and strings moved to Constants class
Enhanced Error Handling: Standardized exception handling across all providers
Improved Documentation: Comprehensive PHPDoc comments and inline documentation
Security Audit: Fixed potential security issues identified in code review

Configuration Enhancements

New configuration options available:

context:
  enable_semantic_chunking: true    # Enable context-aware chunking
  enable_diff_compression: true     # Enable diff compression for large files
  cache_ttl: 3600                  # API response cache TTL in seconds
  max_cache_size: 52428800         # Maximum cache size in bytes (50MB)

9. Output formats

json (default): machine-readable findings array.
summary: human-readable bulleted list. This is also the format used for PR/MR comments.
markdown: structured markdown format with emojis, metadata, and organized findings by severity and file.

9. Development and QA

Requires PHP and Composer.
Run unit and E2E tests with PHPUnit:

./vendor/bin/phpunit

Coding standards and static analysis:

composer analyse

The codebase uses declare(strict_types=1) and Symfony components (Console, YAML, Filesystem, Process).

10. Credits

Author: Raffaele Carelle
Contributors: Thanks to everyone who reports issues or submits PRs.

11. License

This project is open-sourced under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github		.github
bin		bin
docs		docs
examples		examples
src		src
tests		tests
.aicodereview.yml		.aicodereview.yml
.aicodereview.yml.dist		.aicodereview.yml.dist
.gitattributes		.gitattributes
.gitignore		.gitignore
.php-cs-fixer.php		.php-cs-fixer.php
.phpunit.result.cache		.phpunit.result.cache
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
logo.png		logo.png
phpstan.neon		phpstan.neon
phpunit.xml.dist		phpunit.xml.dist
rector.php		rector.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Code Review Bot

Table of Contents

1. Objectives and scope

2. Architecture and main modules (PHP)

3. Quick start

4. Configuration (.aicodereview.yml)

5. VCS adapters (GitHub/GitLab/Bitbucket)

6. Coding guidelines file

7. AI providers and token budgeting

7.1 Advanced Token Optimization Features

8. Security & Performance

Security Enhancements

Performance Optimizations

Code Quality Improvements

Configuration Enhancements

9. Output formats

9. Development and QA

10. Credits

11. License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

raffaelecarelle/ai-code-review-bot

Folders and files

Latest commit

History

Repository files navigation

AI Code Review Bot

Table of Contents

1. Objectives and scope

2. Architecture and main modules (PHP)

3. Quick start

4. Configuration (.aicodereview.yml)

5. VCS adapters (GitHub/GitLab/Bitbucket)

6. Coding guidelines file

7. AI providers and token budgeting

7.1 Advanced Token Optimization Features

8. Security & Performance

Security Enhancements

Performance Optimizations

Code Quality Improvements

Configuration Enhancements

9. Output formats

9. Development and QA

10. Credits

11. License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages