Quick Start · Core Modules · FAQ
🇨🇳 中文 · 🇯🇵 日本語 · 🇪🇸 Español · 🇫🇷 Français · 🇸🇦 العربية · 🇷🇺 Русский · 🇮🇳 हिन्दी · 🇵🇹 Português
📚 Massive Document Knowledge Q&A • 🎨 Interactive Learning Visualization
🎯 Knowledge Reinforcement • 🔍 Deep Research & Idea Generation
[2026.1.1] Happy New Year! Join our GitHub Discussions — shape the future of DeepTutor! 💬
[2025.12.30] Visit our Official Website for more details! !
[2025.12.29] DeepTutor v0.1 is now live! ✨
• Smart Knowledge Base: Upload textbooks, research papers, technical manuals, and domain-specific documents. Build a comprehensive AI-powered knowledge repository for instant access.
• Multi-Agent Problem Solving: Dual-loop reasoning architecture with RAG, web search, and code execution -- delivering step-by-step solutions with precise citations.
• Knowledge Simplification & Explanations: Transform complex concepts, knowledge, and algorithms into easy-to-understand visual aids, detailed step-by-step breakdowns, and engaging interactive demonstrations.
• Personalized Q&A: Context-aware conversations that adapt to your learning progress, with interactive pages and session-based knowledge tracking.
• Intelligent Exercise Creation: Generate targeted quizzes, practice problems, and customized assessments tailored to your current knowledge level and specific learning objectives.
• Authentic Exam Simulation: Upload reference exams to generate practice questions that perfectly match the original style, format, and difficulty—giving you realistic preparation for the actual test.
• Comprehensive Research & Literature Review: Conduct in-depth topic exploration with systematic analysis. Identify patterns, connect related concepts across disciplines, and synthesize existing research findings.
• Novel Insight Discovery: Generate structured learning materials and uncover knowledge gaps. Identify promising new research directions through intelligent cross-domain knowledge synthesis.
Multi-agent Problem Solving with Exact Citations |
Step-by-step Visual Explanations with Personal QAs. |
Custom Questions |
Mimic Questions |
Personal Knowledge Base |
Personal Notebook |
🌙 Use DeepTutor in Dark Mode!
• Intuitive Interaction: Simple bidirectional query-response flow for intuitive interaction.
• Structured Output: Structured response generation that organizes complex information into actionable outputs.
• Problem Solving & Assessment: Step-by-step problem solving and custom assessment generation.
• Research & Learning: Deep Research for topic exploration and Guided Learning with visualization.
• Idea Generation: Automated and interactive concept development with multi-source insights.
• Information Retrieval: RAG hybrid retrieval, real-time web search, and academic paper databases.
• Processing & Analysis: Python code execution, query item lookup, and PDF parsing for document analysis.
• Knowledge Graph: Entity-relation mapping for semantic connections and knowledge discovery.
• Vector Store: Embedding-based semantic search for intelligent content retrieval.
• Memory System: Session state management and citation tracking for contextual continuity.
🌟 Star to follow our future updates!
- Support Local LLM Services (e.g., ollama)
- Refactor RAG Module (see Discussions)
- Deep-coding from idea generation
- Personalized Interaction with Notebook
# Clone the repository
git clone https://github.com/HKUDS/DeepTutor.git
cd DeepTutor
# Set Up Virtual Environment (Choose One Option)
# Option A: Using conda (Recommended)
conda create -n deeptutor python=3.10
conda activate deeptutor
# Option B: Using venv
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activateRun the automated installation script to install all required dependencies:
# Recommended
bash scripts/install_all.sh
# Alternative Python Scripts
python scripts/install_all.py
# Or Install Dependencies Manually
pip install -r requirements.txt
npm installCreate a .env file in the project root directory based on .env.example:
# Copy from .env.example template (if exists)
cp .env.example .env
# Then edit .env file with your API keys.By default, the application uses:
- Backend (FastAPI):
8001 - Frontend (Next.js):
3782
You can modify these ports in config/main.yaml by editing the server.backend_port and server.frontend_port values.
LLM Configuration: Agent settings for temperature and max_tokens are centralized in config/agents.yaml. Each module (guide, solve, research, question, ideagen, co_writer) has customizable parameters. See Configuration Documentation for details.
Experience the system quickly with two pre-built knowledge bases and a collection of challenging questions with usage examples.
Research Papers Collection — 5 papers (20-50 pages each)
A curated collection of 5 research papers from our lab covering RAG and Agent fields. This demo showcases broad knowledge coverage for research scenarios.
Used Papers: AI-Researcher | AutoAgent | RAG-Anything | LightRAG | VideoRAG
Data Science Textbook — 8 chapters, 296 pages
A comprehensive data science textbook with challenging content. This demo showcases deep knowledge depth for learning scenarios.
Book Link: Deep Representation Learning Book
Download and Setup:
- Download the demo package: Google Drive
- Extract the compressed files directly into the
data/directory - Knowledge bases will be automatically available once you start the system
Note: Our demo knowledge bases use
text-embedding-3-largewithdimensions = 3072. Ensure your embeddings model has matching dimensions (3072) for compatibility.
# Activate virtual environment
conda activate aitutor # or: source venv/bin/activate
# Start web interface (frontend + backend)
python scripts/start_web.py
# Alternative: CLI interface only
python scripts/start.py
# Stop the service: Ctrl+CCreate custom knowledge bases through the web interface with support for multiple file formats.
- Access Knowledge Base: Navigate to http://localhost:{frontend_port}/knowledge
- Create New Base: Click "New Knowledge Base"
- Configure Settings: Enter a unique name for your knowledge base
- Upload Content: Add single or multiple files for batch processing
- Monitor Progress: Track processing status in the terminal running
start_web.py- Large files may take several minutes to complete
- Knowledge base becomes available once processing finishes
Tips: Large files may require several minutes to process. Multiple files can be uploaded simultaneously for efficient batch processing.
| Service | URL | Description |
|---|---|---|
| Frontend | http://localhost:{frontend_port} | Main web interface |
| API Docs | http://localhost:{backend_port}/docs | Interactive API documentation |
| Health | http://localhost:{backend_port}/api/v1/knowledge/health | System health check |
All user content and system data are stored in the data/ directory:
data/
├── knowledge_bases/ # Knowledge base storage
└── user/ # User activity data
├── solve/ # Problem solving results and artifacts
├── question/ # Generated questions
├── research/ # Research reports and cache
├── co-writer/ # Interactive IdeaGen documents and audio files
├── notebook/ # Notebook records and metadata
├── guide/ # Guided learning sessions
├── logs/ # System logs
└── run_code_workspace/ # Code execution workspace
Results are automatically saved during all activities. Directories are created automatically as needed.
🧠 Smart Solver
Intelligent problem-solving system based on Analysis Loop + Solve Loop dual-loop architecture, supporting multi-mode reasoning and dynamic knowledge retrieval.
Core Features
| Feature | Description |
|---|---|
| Dual-Loop Architecture | Analysis Loop: InvestigateAgent → NoteAgent Solve Loop: PlanAgent → ManagerAgent → SolveAgent → CheckAgent → Format |
| Multi-Agent Collaboration | Specialized agents: InvestigateAgent, NoteAgent, PlanAgent, ManagerAgent, SolveAgent, CheckAgent |
| Real-time Streaming | WebSocket transmission with live reasoning process display |
| Tool Integration | RAG (naive/hybrid), Web Search, Query Item, Code Execution |
| Persistent Memory | JSON-based memory files for context preservation |
| Citation Management | Structured citations with reference tracking |
Usage
- Visit http://localhost:{frontend_port}/solver
- Select a knowledge base
- Enter your question, click "Solve"
- Watch the real-time reasoning process and final answer
Python API
import asyncio
from src.agents.solve import MainSolver
async def main():
solver = MainSolver(kb_name="ai_textbook")
result = await solver.solve(
question="Calculate the linear convolution of x=[1,2,3] and h=[4,5]",
mode="auto"
)
print(result['formatted_solution'])
asyncio.run(main())Output Location
data/user/solve/solve_YYYYMMDD_HHMMSS/
├── investigate_memory.json # Analysis Loop memory
├── solve_chain.json # Solve Loop steps & tool records
├── citation_memory.json # Citation management
├── final_answer.md # Final solution (Markdown)
├── performance_report.json # Performance monitoring
└── artifacts/ # Code execution outputs
📝 Question Generator
Dual-mode question generation system supporting custom knowledge-based generation and reference exam paper mimicking with automatic validation.
Core Features
| Feature | Description |
|---|---|
| Custom Mode | Background Knowledge → Question Planning → Generation → Single-Pass Validation Analyzes question relevance without rejection logic |
| Mimic Mode | PDF Upload → MinerU Parsing → Question Extraction → Style Mimicking Generates questions based on reference exam structure |
| ReAct Engine | QuestionGenerationAgent with autonomous decision-making (think → act → observe) |
| Validation Analysis | Single-pass relevance analysis with kb_coverage and extension_points |
| Question Types | Multiple choice, fill-in-the-blank, calculation, written response, etc. |
| Batch Generation | Parallel processing with progress tracking |
| Complete Persistence | All intermediate files saved (background knowledge, plan, individual results) |
| Timestamped Output | Mimic mode creates batch folders: mimic_YYYYMMDD_HHMMSS_{pdf_name}/ |
Usage
Custom Mode:
- Visit http://localhost:{frontend_port}/question
- Fill in requirements (topic, difficulty, question type, count)
- Click "Generate Questions"
- View generated questions with validation reports
Mimic Mode:
- Visit http://localhost:{frontend_port}/question
- Switch to "Mimic Exam" tab
- Upload PDF or provide parsed exam directory
- Wait for parsing → extraction → generation
- View generated questions alongside original references
Python API
Custom Mode - Full Pipeline:
import asyncio
from src.agents.question import AgentCoordinator
async def main():
coordinator = AgentCoordinator(
kb_name="ai_textbook",
output_dir="data/user/question"
)
# Generate multiple questions from text requirement
result = await coordinator.generate_questions_custom(
requirement_text="Generate 3 medium-difficulty questions about deep learning basics",
difficulty="medium",
question_type="choice",
count=3
)
print(f"✅ Generated {result['completed']}/{result['requested']} questions")
for q in result['results']:
print(f"- Relevance: {q['validation']['relevance']}")
asyncio.run(main())Mimic Mode - PDF Upload:
from src.agents.question.tools.exam_mimic import mimic_exam_questions
result = await mimic_exam_questions(
pdf_path="exams/midterm.pdf",
kb_name="calculus",
output_dir="data/user/question/mimic_papers",
max_questions=5
)
print(f"✅ Generated {result['successful_generations']} questions")
print(f"Output: {result['output_file']}")Output Location
Custom Mode:
data/user/question/custom_YYYYMMDD_HHMMSS/
├── background_knowledge.json # RAG retrieval results
├── question_plan.json # Question planning
├── question_1_result.json # Individual question results
├── question_2_result.json
└── ...
Mimic Mode:
data/user/question/mimic_papers/
└── mimic_YYYYMMDD_HHMMSS_{pdf_name}/
├── {pdf_name}.pdf # Original PDF
├── auto/{pdf_name}.md # MinerU parsed markdown
├── {pdf_name}_YYYYMMDD_HHMMSS_questions.json # Extracted questions
└── {pdf_name}_YYYYMMDD_HHMMSS_generated_questions.json # Generated questions
🎓 Guided Learning
Personalized learning system based on notebook content, automatically generating progressive learning paths through interactive pages and smart Q&A.
Core Features
| Feature | Description |
|---|---|
| Multi-Agent Architecture | LocateAgent: Identifies 3-5 progressive knowledge points InteractiveAgent: Converts to visual HTML pages ChatAgent: Provides contextual Q&A SummaryAgent: Generates learning summaries |
| Smart Knowledge Location | Automatic analysis of notebook content |
| Interactive Pages | HTML page generation with bug fixing |
| Smart Q&A | Context-aware answers with explanations |
| Progress Tracking | Real-time status with session persistence |
| Cross-Notebook Support | Select records from multiple notebooks |
Usage Flow
- Select Notebook(s) — Choose one or multiple notebooks (cross-notebook selection supported)
- Generate Learning Plan — LocateAgent identifies 3-5 core knowledge points
- Start Learning — InteractiveAgent generates HTML visualization
- Learning Interaction — Ask questions, click "Next" to proceed
- Complete Learning — SummaryAgent generates learning summary
Output Location
data/user/guide/
└── session_{session_id}.json # Complete session state, knowledge points, chat history
✏️ Interactive IdeaGen (Co-Writer)
Intelligent Markdown editor supporting AI-assisted writing, auto-annotation, and TTS narration.
Core Features
| Feature | Description |
|---|---|
| Rich Text Editing | Full Markdown syntax support with live preview |
| EditAgent | Rewrite: Custom instructions with optional RAG/web context Shorten: Compress while preserving key information Expand: Add details and context |
| Auto-Annotation | Automatic key content identification and marking |
| NarratorAgent | Script generation, TTS audio, multiple voices (Cherry, Stella, Annie, Cally, Eva, Bella) |
| Context Enhancement | Optional RAG or web search for additional context |
| Multi-Format Export | Markdown, PDF, etc. |
Usage
- Visit http://localhost:{frontend_port}/co_writer
- Enter or paste text in the editor
- Use AI features: Rewrite, Shorten, Expand, Auto Mark, Narrate
- Export to Markdown or PDF
Output Location
data/user/co-writer/
├── audio/ # TTS audio files
│ └── {operation_id}.mp3
├── tool_calls/ # Tool call history
│ └── {operation_id}_{tool_type}.json
└── history.json # Edit history
🔬 Deep Research
DR-in-KG (Deep Research in Knowledge Graph) — A systematic deep research system based on Dynamic Topic Queue architecture, enabling multi-agent collaboration across three phases: Planning → Researching → Reporting.
Core Features
| Feature | Description |
|---|---|
| Three-Phase Architecture | Phase 1 (Planning): RephraseAgent (topic optimization) + DecomposeAgent (subtopic decomposition) Phase 2 (Researching): ManagerAgent (queue scheduling) + ResearchAgent (research decisions) + NoteAgent (info compression) Phase 3 (Reporting): Deduplication → Three-level outline generation → Report writing with citations |
| Dynamic Topic Queue | Core scheduling system with TopicBlock state management: PENDING → RESEARCHING → COMPLETED/FAILED. Supports dynamic topic discovery during research |
| Execution Modes | Series Mode: Sequential topic processing Parallel Mode: Concurrent multi-topic processing with AsyncCitationManagerWrapper for thread-safe operations |
| Multi-Tool Integration | RAG (hybrid/naive), Query Item (entity lookup), Paper Search, Web Search, Code Execution — dynamically selected by ResearchAgent |
| Unified Citation System | Centralized CitationManager as single source of truth for citation ID generation, ref_number mapping, and deduplication |
| Preset Configurations | quick: Fast research (1-2 subtopics, 1-2 iterations) medium/standard: Balanced depth (5 subtopics, 4 iterations) deep: Thorough research (8 subtopics, 7 iterations) auto: Agent autonomously decides depth |
Citation System Architecture
The citation system follows a centralized design with CitationManager as the single source of truth:
┌─────────────────────────────────────────────────────────────────┐
│ CitationManager │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ID Generation │ │ ref_number Map │ │ Deduplication │ │
│ │ PLAN-XX │ │ citation_id → │ │ (papers only) │ │
│ │ CIT-X-XX │ │ ref_number │ │ │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
└───────────┼────────────────────┼────────────────────┼───────────┘
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│DecomposeAgent│ │ReportingAgent│ │ References │
│ ResearchAgent│ │ (inline [N]) │ │ Section │
│ NoteAgent │ └─────────────┘ └────────────┘
└─────────────┘
| Component | Description |
|---|---|
| ID Format | PLAN-XX (planning stage RAG queries) + CIT-X-XX (research stage, X=block number) |
| ref_number Mapping | Sequential 1-based numbers built from sorted citation IDs, with paper deduplication |
| Inline Citations | Simple [N] format in LLM output, post-processed to clickable [[N]](#ref-N) links |
| Citation Table | Clear reference table provided to LLM: Cite as [1] → (RAG) query preview... |
| Post-processing | Automatic format conversion + validation to remove invalid citation references |
| Parallel Safety | Thread-safe async methods (get_next_citation_id_async, add_citation_async) for concurrent execution |
Parallel Execution Architecture
When execution_mode: "parallel" is enabled, multiple topic blocks are researched concurrently:
┌─────────────────────────────────────────────────────────────────────────┐
│ Parallel Research Execution │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ DynamicTopicQueue AsyncCitationManagerWrapper │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Topic 1 (PENDING)│ ──┐ │ Thread-safe wrapper │ │
│ │ Topic 2 (PENDING)│ ──┼──→ asyncio │ for CitationManager │ │
│ │ Topic 3 (PENDING)│ ──┤ Semaphore │ │ │
│ │ Topic 4 (PENDING)│ ──┤ (max=5) │ • get_next_citation_ │ │
│ │ Topic 5 (PENDING)│ ──┘ │ id_async() │ │
│ └─────────────────┘ │ • add_citation_async() │ │
│ │ └───────────┬─────────────┘ │
│ ▼ │ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Concurrent ResearchAgent Tasks │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Task 1 │ │ Task 2 │ │ Task 3 │ │ Task 4 │ ... │ │
│ │ │(Topic 1)│ │(Topic 2)│ │(Topic 3)│ │(Topic 4)│ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ └────────────┴────────────┴────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ AsyncManagerAgentWrapper │ │
│ │ (Thread-safe queue updates) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
| Component | Description |
|---|---|
asyncio.Semaphore |
Limits concurrent tasks to max_parallel_topics (default: 5) |
AsyncCitationManagerWrapper |
Wraps CitationManager with asyncio.Lock() for thread-safe ID generation |
AsyncManagerAgentWrapper |
Ensures queue state updates are atomic across parallel tasks |
| Real-time Progress | Live display of all active research tasks with status indicators |
Agent Responsibilities
| Agent | Phase | Responsibility |
|---|---|---|
| RephraseAgent | Planning | Optimizes user input topic, supports multi-turn user interaction for refinement |
| DecomposeAgent | Planning | Decomposes topic into subtopics with RAG context, obtains citation IDs from CitationManager |
| ManagerAgent | Researching | Queue state management, task scheduling, dynamic topic addition |
| ResearchAgent | Researching | Knowledge sufficiency check, query planning, tool selection, requests citation IDs before each tool call |
| NoteAgent | Researching | Compresses raw tool outputs into summaries, creates ToolTraces with pre-assigned citation IDs |
| ReportingAgent | Reporting | Builds citation map, generates three-level outline, writes report sections with citation tables, post-processes citations |
Report Generation Pipeline
1. Build Citation Map → CitationManager.build_ref_number_map()
2. Generate Outline → Three-level headings (H1 → H2 → H3)
3. Write Sections → LLM uses [N] citations with provided citation table
4. Post-process → Convert [N] → [[N]](#ref-N), validate references
5. Generate References → Academic-style entries with collapsible source details
Usage
- Visit http://localhost:{frontend_port}/research
- Enter research topic
- Select research mode (quick/medium/deep/auto)
- Watch real-time progress with parallel/series execution
- View structured report with clickable inline citations
- Export as Markdown or PDF (with proper page splitting and Mermaid diagram support)
CLI
# Quick mode (fast research)
python src/agents/research/main.py --topic "Deep Learning Basics" --preset quick
# Medium mode (balanced)
python src/agents/research/main.py --topic "Transformer Architecture" --preset medium
# Deep mode (thorough research)
python src/agents/research/main.py --topic "Graph Neural Networks" --preset deep
# Auto mode (agent decides depth)
python src/agents/research/main.py --topic "Reinforcement Learning" --preset autoPython API
import asyncio
from src.agents.research import ResearchPipeline
from src.core.core import get_llm_config, load_config_with_main
async def main():
# Load configuration (main.yaml merged with any module-specific overrides)
config = load_config_with_main("research_config.yaml")
llm_config = get_llm_config()
# Create pipeline (agent parameters loaded from agents.yaml automatically)
pipeline = ResearchPipeline(
config=config,
api_key=llm_config["api_key"],
base_url=llm_config["base_url"],
kb_name="ai_textbook" # Optional: override knowledge base
)
# Run research
result = await pipeline.run(topic="Attention Mechanisms in Deep Learning")
print(f"Report saved to: {result['final_report_path']}")
asyncio.run(main())Output Location
data/user/research/
├── reports/ # Final research reports
│ ├── research_YYYYMMDD_HHMMSS.md # Markdown report with clickable citations [[N]](#ref-N)
│ └── research_*_metadata.json # Research metadata and statistics
└── cache/ # Research process cache
└── research_YYYYMMDD_HHMMSS/
├── queue.json # DynamicTopicQueue state (TopicBlocks + ToolTraces)
├── citations.json # Citation registry with ID counters and ref_number mapping
│ # - citations: {citation_id: citation_info}
│ # - counters: {plan_counter, block_counters}
├── step1_planning.json # Planning phase results (subtopics + PLAN-XX citations)
├── planning_progress.json # Planning progress events
├── researching_progress.json # Researching progress events
├── reporting_progress.json # Reporting progress events
├── outline.json # Three-level report outline structure
└── token_cost_summary.json # Token usage statistics
Citation File Structure (citations.json):
{
"research_id": "research_20241209_120000",
"citations": {
"PLAN-01": {"citation_id": "PLAN-01", "tool_type": "rag_hybrid", "query": "...", "summary": "..."},
"CIT-1-01": {"citation_id": "CIT-1-01", "tool_type": "paper_search", "papers": [...], ...}
},
"counters": {
"plan_counter": 2,
"block_counters": {"1": 3, "2": 2}
}
}Configuration Options
Key configuration in config/main.yaml (research section) and config/agents.yaml:
# config/agents.yaml - Agent LLM parameters
research:
temperature: 0.5
max_tokens: 12000
# config/main.yaml - Research settings
research:
# Execution Mode
researching:
execution_mode: "parallel" # "series" or "parallel"
max_parallel_topics: 5 # Max concurrent topics
max_iterations: 5 # Max iterations per topic
# Tool Switches
enable_rag_hybrid: true # Hybrid RAG retrieval
enable_rag_naive: true # Basic RAG retrieval
enable_paper_search: true # Academic paper search
enable_web_search: true # Web search (also controlled by tools.web_search.enabled)
enable_run_code: true # Code execution
# Queue Limits
queue:
max_length: 5 # Maximum topics in queue
# Reporting
reporting:
enable_inline_citations: true # Enable clickable [N] citations in report
# Presets: quick, medium, deep, auto
# Global tool switches in tools section
tools:
web_search:
enabled: true # Global web search switch (higher priority)💡 Automated IdeaGen
Research idea generation system that extracts knowledge points from notebook records and generates research ideas through multi-stage filtering.
Core Features
| Feature | Description |
|---|---|
| MaterialOrganizerAgent | Extracts knowledge points from notebook records |
| Multi-Stage Filtering | Loose Filter → Explore Ideas (5+ per point) → Strict Filter → Generate Markdown |
| Idea Exploration | Innovative thinking from multiple dimensions |
| Structured Output | Organized markdown with knowledge points and ideas |
| Progress Callbacks | Real-time updates for each stage |
Usage
- Visit http://localhost:{frontend_port}/ideagen
- Select a notebook with records
- Optionally provide user thoughts/preferences
- Click "Generate Ideas"
- View generated research ideas organized by knowledge points
Python API
import asyncio
from src.agents.ideagen import IdeaGenerationWorkflow, MaterialOrganizerAgent
from src.core.core import get_llm_config
async def main():
llm_config = get_llm_config()
# Step 1: Extract knowledge points from materials
organizer = MaterialOrganizerAgent(
api_key=llm_config["api_key"],
base_url=llm_config["base_url"]
)
knowledge_points = await organizer.extract_knowledge_points(
"Your learning materials or notebook content here"
)
# Step 2: Generate research ideas
workflow = IdeaGenerationWorkflow(
api_key=llm_config["api_key"],
base_url=llm_config["base_url"]
)
result = await workflow.process(knowledge_points)
print(result) # Markdown formatted research ideas
asyncio.run(main())📊 Dashboard + Knowledge Base Management
Unified system entry providing activity tracking, knowledge base management, and system status monitoring.
Key Features
| Feature | Description |
|---|---|
| Activity Statistics | Recent solving/generation/research records |
| Knowledge Base Overview | KB list, statistics, incremental updates |
| Notebook Statistics | Notebook counts, record distribution |
| Quick Actions | One-click access to all modules |
Usage
- Web Interface: Visit http://localhost:{frontend_port} to view system overview
- Create KB: Click "New Knowledge Base", upload PDF/Markdown documents
- View Activity: Check recent learning activities on Dashboard
📓 Notebook
Unified learning record management, connecting outputs from all modules to create a personalized learning knowledge base.
Core Features
| Feature | Description |
|---|---|
| Multi-Notebook Management | Create, edit, delete notebooks |
| Unified Record Storage | Integrate solving/generation/research/Interactive IdeaGen records |
| Categorization Tags | Auto-categorize by type, knowledge base |
| Custom Appearance | Color, icon personalization |
Usage
- Visit http://localhost:{frontend_port}/notebook
- Create new notebook (set name, description, color, icon)
- After completing tasks in other modules, click "Add to Notebook"
- View and manage all records on the notebook page
| Configuration | Data Directory | API Backend | Core Utilities |
| Knowledge Base | Tools | Web Frontend | Solve Module |
| Question Module | Research Module | Interactive IdeaGen Module | Guide Module |
| Automated IdeaGen Module | |||
Backend fails to start?
Checklist
- Confirm Python version >= 3.10
- Confirm all dependencies installed:
pip install -r requirements.txt - Check if port 8001 is in use (configurable in
config/main.yaml) - Check
.envfile configuration
Solutions
- Change port: Edit
config/main.yamlserver.backend_port - Check logs: Review terminal error messages
Port occupied after Ctrl+C?
Problem
After pressing Ctrl+C during a running task (e.g., deep research), restarting shows "port already in use" error.
Cause
Ctrl+C sometimes only terminates the frontend process while the backend continues running in the background.
Solution
# macOS/Linux: Find and kill the process
lsof -i :8001
kill -9 <PID>
# Windows: Find and kill the process
netstat -ano | findstr :8001
taskkill /PID <PID> /FThen restart the service with python scripts/start_web.py.
npm: command not found error?
Problem
Running scripts/start_web.py shows npm: command not found or exit status 127.
Checklist
- Check if npm is installed:
npm --version - Check if Node.js is installed:
node --version - Confirm conda environment is activated (if using conda)
Solutions
# Option A: Using Conda (Recommended)
conda install -c conda-forge nodejs
# Option B: Using Official Installer
# Download from https://nodejs.org/
# Option C: Using nvm
nvm install 18
nvm use 18Verify Installation
node --version # Should show v18.x.x or higher
npm --version # Should show version numberFrontend cannot connect to backend?
Checklist
- Confirm backend is running (visit http://localhost:8001/docs)
- Check browser console for error messages
Solution
Create .env.local in web directory:
NEXT_PUBLIC_API_BASE=http://localhost:8001WebSocket connection fails?
Checklist
- Confirm backend is running
- Check firewall settings
- Confirm WebSocket URL is correct
Solution
- Check backend logs
- Confirm URL format:
ws://localhost:8001/api/v1/...
Where are module outputs stored?
| Module | Output Path |
|---|---|
| Solve | data/user/solve/solve_YYYYMMDD_HHMMSS/ |
| Question | data/user/question/question_YYYYMMDD_HHMMSS/ |
| Research | data/user/research/reports/ |
| Interactive IdeaGen | data/user/co-writer/ |
| Notebook | data/user/notebook/ |
| Guide | data/user/guide/session_{session_id}.json |
| Logs | data/user/logs/ |
How to add a new knowledge base?
Web Interface
- Visit http://localhost:{frontend_port}/knowledge
- Click "New Knowledge Base"
- Enter knowledge base name
- Upload PDF/TXT/MD documents
- System will process documents in background
CLI
python -m src.knowledge.start_kb init <kb_name> --docs <pdf_path>How to incrementally add documents to existing KB?
CLI (Recommended)
python -m src.knowledge.add_documents <kb_name> --docs <new_document.pdf>Benefits
- Only processes new documents, saves time and API costs
- Automatically merges with existing knowledge graph
- Preserves all existing data
Numbered items extraction failed with uvloop.Loop error?
Problem
When initializing a knowledge base, you may encounter this error:
ValueError: Can't patch loop of type <class 'uvloop.Loop'>
This occurs because Uvicorn uses uvloop event loop by default, which is incompatible with nest_asyncio.
Solution
Use one of the following methods to extract numbered items:
# Option 1: Using the shell script (recommended)
./scripts/extract_numbered_items.sh <kb_name>
# Option 2: Direct Python command
python src/knowledge/extract_numbered_items.py --kb <kb_name> --base-dir ./data/knowledge_basesThis will extract numbered items (Definitions, Theorems, Equations, etc.) from your knowledge base without reinitializing it.
This project is licensed under the AGPL-3.0 License.
We welcome contributions from the community! To ensure code quality and consistency, please follow the guidelines below.
Development Setup
This project uses pre-commit hooks to automatically format code and check for issues before commits.
Step 1: Install pre-commit
# Using pip
pip install pre-commit
# Or using conda
conda install -c conda-forge pre-commitStep 2: Install Git hooks
cd DeepTutor
pre-commit installStep 3: (Optional) Run checks on all files
pre-commit run --all-filesEvery time you run git commit, pre-commit hooks will automatically:
- Format Python code with Ruff
- Format frontend code with Prettier
- Check for syntax errors
- Validate YAML/JSON files
- Detect potential security issues
| Tool | Purpose | Configuration |
|---|---|---|
| Ruff | Python linting & formatting | pyproject.toml |
| Prettier | Frontend code formatting | web/.prettierrc.json |
| detect-secrets | Security check | .secrets.baseline |
Note: The project uses Ruff format instead of Black to avoid formatting conflicts.
# Normal commit (hooks run automatically)
git commit -m "Your commit message"
# Manually check all files
pre-commit run --all-files
# Update hooks to latest versions
pre-commit autoupdate
# Skip hooks (not recommended, only for emergencies)
git commit --no-verify -m "Emergency fix"- Fork and Clone: Fork the repository and clone your fork
- Create Branch: Create a feature branch from
main - Install Pre-commit: Follow the setup steps above
- Make Changes: Write your code following the project's style
- Test: Ensure your changes work correctly
- Commit: Pre-commit hooks will automatically format your code
- Push and PR: Push to your fork and create a Pull Request
- Use GitHub Issues to report bugs or suggest features
- Provide detailed information about the issue
- Include steps to reproduce if it's a bug
❤️ We thank all our contributors for their valuable contributions.
| ⚡ LightRAG | 🎨 RAG-Anything | 💻 DeepCode | 🔬 AI-Researcher |
|---|---|---|---|
| Simple and Fast RAG | Multimodal RAG | AI Code Assistant | Research Automation |
⭐ Star us · 🐛 Report a bug · 💬 Discussions
✨ Thanks for visiting DeepTutor!






