Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX #2311

danielaskdd · 2025-11-04T18:14:05Z

🎯 Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX

Summary

This PR significantly improves the LightRAG evaluation system by adding command-line argument support, implementing a two-stage concurrent pipeline for better performance, and enhancing user experience with real-time progress visualization.

🚀 Key Improvements

1. Command-Line Arguments Support

Added argparse for professional CLI interface
Support for --dataset / -d to specify custom test datasets
Support for --ragendpoint / -r to specify custom RAG API endpoints
Comprehensive --help documentation
Backward compatible with environment variables

2. Two-Stage Pipeline Architecture

Stage 1: RAG response generation (2x concurrency)
Stage 2: RAGAS evaluation (controlled concurrency)
Prevents overwhelming the evaluation system when RAG is fast
Better resource utilization and throughput

3. Real-Time Progress Visualization

Added tqdm progress bars for each concurrent evaluation
Thread-safe progress bar management with position pooling
Lock-protected creation to prevent display conflicts
Clear visual feedback on evaluation progress

4. Enhanced Documentation

Comprehensive README updates with usage examples
CLI argument reference table
Multiple usage scenarios documented
Better onboarding for new users

5. Improved Test Dataset

Added 3 new comprehensive test cases covering:
- Vector database support and characteristics
- RAG evaluation metrics explanation
- LightRAG core benefits and improvements
Unified project naming for better organization

💡 Usage Examples

# Use defaults
python lightrag/evaluation/eval_rag_quality.py

# Custom dataset
python lightrag/evaluation/eval_rag_quality.py --dataset my_test.json

# Custom endpoint
python lightrag/evaluation/eval_rag_quality.py -r http://my-server:9621

# Both custom
python lightrag/evaluation/eval_rag_quality.py -d my_test.json -r http://localhost:9621

# Get help
python lightrag/evaluation/eval_rag_quality.py --help

🧪 Testing Recommendations

Test with various dataset sizes (small, medium, large)
Verify concurrent evaluation with different EVAL_MAX_CONCURRENT values
Test CLI arguments in different combinations
Validate progress bar display in different terminal environments

- Add --dataset and --ragendpoint flags - Support short forms -d and -r - Update README with usage examples

• Lower concurrent evals from 3 to 2 • Standardize project names in samples • Add 3 new evaluation questions • Expand ground truth detail coverage • Improve dataset comprehensiveness

• Add tqdm progress bar for eval steps • Pass progress bar to RAGAS evaluate • Ensure progress bar cleanup in finally • Remove redundant output buffer flushes

• Split RAG gen and eval stages • Add rag_semaphore for stage 1 • Add eval_semaphore for stage 2 • Improve concurrency control • Update connection pool limits

• Move rag_semaphore to wrap full function • Increase RAG concurrency to 2x eval limit • Prevent memory buildup from slow evals • Keep eval_semaphore for RAGAS control

• Add position pool for tqdm bars • Serialize tqdm creation with lock • Set leave=False to clear completed bars • Pass position/lock to eval tasks • Import tqdm.auto for better display

danielaskdd added 6 commits November 4, 2025 21:40

feat: add command-line args to RAG evaluation script

41c26a3

- Add --dataset and --ragendpoint flags - Support short forms -d and -r - Update README with usage examples

Update evaluation defaults and expand sample dataset

c358f40

• Lower concurrent evals from 3 to 2 • Standardize project names in samples • Add 3 new evaluation questions • Expand ground truth detail coverage • Improve dataset comprehensiveness

Improve RAGAS evaluation progress tracking and clean up output handling

d36be1f

• Add tqdm progress bar for eval steps • Pass progress bar to RAGAS evaluate • Ensure progress bar cleanup in finally • Remove redundant output buffer flushes

Implement two-stage pipeline for RAG evaluation with separate semaphores

83715a3

• Split RAG gen and eval stages • Add rag_semaphore for stage 1 • Add eval_semaphore for stage 2 • Improve concurrency control • Update connection pool limits

Restructure semaphore control to manage entire evaluation pipeline

e5abe9d

• Move rag_semaphore to wrap full function • Increase RAG concurrency to 2x eval limit • Prevent memory buildup from slow evals • Keep eval_semaphore for RAGAS control

Fix tqdm progress bar conflicts in concurrent RAG evaluation

2823f92

• Add position pool for tqdm bars • Serialize tqdm creation with lock • Set leave=False to clear completed bars • Pass position/lock to eval tasks • Import tqdm.auto for better display

danielaskdd merged commit eb80771 into HKUDS:main Nov 4, 2025
1 check passed

danielaskdd deleted the evalueate-cli branch November 5, 2025 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX #2311

Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX #2311

Uh oh!

danielaskdd commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX #2311

Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX #2311

Uh oh!

Conversation

danielaskdd commented Nov 4, 2025