Skip to content

Conversation

@danielaskdd
Copy link
Collaborator

🎯 Feature: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX

Summary

This PR significantly improves the LightRAG evaluation system by adding command-line argument support, implementing a two-stage concurrent pipeline for better performance, and enhancing user experience with real-time progress visualization.

🚀 Key Improvements

1. Command-Line Arguments Support

  • Added argparse for professional CLI interface
  • Support for --dataset / -d to specify custom test datasets
  • Support for --ragendpoint / -r to specify custom RAG API endpoints
  • Comprehensive --help documentation
  • Backward compatible with environment variables

2. Two-Stage Pipeline Architecture

  • Stage 1: RAG response generation (2x concurrency)
  • Stage 2: RAGAS evaluation (controlled concurrency)
  • Prevents overwhelming the evaluation system when RAG is fast
  • Better resource utilization and throughput

3. Real-Time Progress Visualization

  • Added tqdm progress bars for each concurrent evaluation
  • Thread-safe progress bar management with position pooling
  • Lock-protected creation to prevent display conflicts
  • Clear visual feedback on evaluation progress

4. Enhanced Documentation

  • Comprehensive README updates with usage examples
  • CLI argument reference table
  • Multiple usage scenarios documented
  • Better onboarding for new users

5. Improved Test Dataset

  • Added 3 new comprehensive test cases covering:
    • Vector database support and characteristics
    • RAG evaluation metrics explanation
    • LightRAG core benefits and improvements
  • Unified project naming for better organization

💡 Usage Examples

# Use defaults
python lightrag/evaluation/eval_rag_quality.py

# Custom dataset
python lightrag/evaluation/eval_rag_quality.py --dataset my_test.json

# Custom endpoint
python lightrag/evaluation/eval_rag_quality.py -r http://my-server:9621

# Both custom
python lightrag/evaluation/eval_rag_quality.py -d my_test.json -r http://localhost:9621

# Get help
python lightrag/evaluation/eval_rag_quality.py --help

🧪 Testing Recommendations

  • Test with various dataset sizes (small, medium, large)
  • Verify concurrent evaluation with different EVAL_MAX_CONCURRENT values
  • Test CLI arguments in different combinations
  • Validate progress bar display in different terminal environments

- Add --dataset and --ragendpoint flags
- Support short forms -d and -r
- Update README with usage examples
• Lower concurrent evals from 3 to 2
• Standardize project names in samples
• Add 3 new evaluation questions
• Expand ground truth detail coverage
• Improve dataset comprehensiveness
• Add tqdm progress bar for eval steps
• Pass progress bar to RAGAS evaluate
• Ensure progress bar cleanup in finally
• Remove redundant output buffer flushes
• Split RAG gen and eval stages
• Add rag_semaphore for stage 1
• Add eval_semaphore for stage 2
• Improve concurrency control
• Update connection pool limits
• Move rag_semaphore to wrap full function
• Increase RAG concurrency to 2x eval limit
• Prevent memory buildup from slow evals
• Keep eval_semaphore for RAGAS control
• Add position pool for tqdm bars
• Serialize tqdm creation with lock
• Set leave=False to clear completed bars
• Pass position/lock to eval tasks
• Import tqdm.auto for better display
@danielaskdd danielaskdd merged commit eb80771 into HKUDS:main Nov 4, 2025
1 check passed
@danielaskdd danielaskdd deleted the evalueate-cli branch November 5, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant