An offline, privacy-first AI tutor that helps you learn from your study materials through intelligent summarization, Q&A, and quiz generation.
- PDF Upload & Processing: Extract and chunk text from study materials
- Intelligent Summarization: Generate bullet-point, paragraph, or exam-style summaries
- Interactive Q&A: Ask questions and get context-aware answers using RAG
- Quiz Generation (Coming Soon): Auto-generate MCQs from your documents
- 100% Offline: All processing happens locally - zero API keys, zero data leakage
- GPU Accelerated: Optimized for consumer GPUs (RTX 4060 and above)
- Framework: FastAPI
- Models:
microsoft/Phi-3-mini-4k-instruct(3.8B) - Summarizationgoogle/flan-t5-xl(3B) - MCQ GenerationBAAI/bge-small-en-v1.5(33M) - Embeddings
- Vector DB: FAISS
- PDF Processing: pdfplumber, PyMuPDF
- Quantization: 4-bit for efficient VRAM usage
- Framework: Next.js 14 + React
- Styling: TailwindCSS
- UI Components: shadcn/ui
- State Management: React Hooks
git clone <repository-url>
cd DocuMentorcd backend
./setup_environment.shcd website/client
npm install./start_backend.sh./start_frontend.shNavigate to: http://localhost:3000
For detailed setup instructions, see SETUP_GUIDE.md
- Click "Choose File" or drag & drop a PDF
- Wait for processing (~10-30 seconds)
- Document appears in sidebar
Type in chat:
Summarize this document
What is the main topic of this document?
Explain the concept of neural networks
List all the formulas mentioned
Generate 5 MCQs from chapter 2
DocuMentor/
├── backend/ # FastAPI backend
│ ├── main.py # Entry point
│ ├── models/ # ML model wrappers
│ │ ├── embeddings.py
│ │ ├── phi3_summarizer.py
│ │ └── t5_quiz_generator.py
│ ├── services/ # Business logic
│ │ ├── pdf_processor.py
│ │ ├── vector_store.py
│ │ └── rag_pipeline.py
│ ├── api/ # API routes & schemas
│ ├── utils/ # Configuration & utilities
│ └── requirements.txt
├── website/
│ ├── client/ # React frontend
│ │ ├── components/ # UI components
│ │ ├── lib/ # API client
│ │ └── pages/ # Next.js pages
│ └── shared/ # Shared TypeScript types
├── data/ # Runtime data (gitignored)
│ ├── uploads/ # Uploaded PDFs
│ ├── vectors/ # FAISS indices
│ └── processed/ # Processed documents
├── models/ # Model cache (gitignored)
├── claude.md # Architecture documentation
├── SETUP_GUIDE.md # Detailed setup guide
├── start_backend.sh # Backend startup script
└── start_frontend.sh # Frontend startup script
POST /api/v1/upload- Upload PDFGET /api/v1/documents- List documentsDELETE /api/v1/documents/{doc_id}- Delete document
POST /api/v1/summarize- Summarize documentPOST /api/v1/ask- Ask question (RAG)POST /api/v1/generate-quiz- Generate MCQs (Coming Soon)
GET /api/v1/health- Health check
Full API documentation available at: http://localhost:8000/docs
- PDF ingestion & chunking
- Embeddings & FAISS vector store
- Local LLM (Phi-3) integration
- RAG pipeline for Q&A
- FastAPI backend
- React frontend
- Document upload & management
- Chunk-level & full-doc summarization
- Multiple summary styles
- MCQ generation (Flan-T5-XL)
- Open-ended question generation
- Flashcard creation
- Spaced repetition system
- Multi-document support
- Topic-based search
- Session management
- Export (PDF/Markdown/Anki)
- Performance optimizations
- Analytics dashboard
- OS: Linux (recommended), macOS, or Windows with WSL
- Python: 3.10+
- Node.js: 18+
- GPU: NVIDIA GPU with 6GB+ VRAM (RTX 3060+, RTX 4060+)
- CUDA: 12.1+
- RAM: 16GB+
- Disk: 15GB+ free space
Note: 4-bit quantization is now enabled by default, making it work on GPUs with as little as 6GB VRAM!
- Embedding model: ~150MB VRAM
- Phi-3: ~2-3GB VRAM (reduced from 7GB!)
- Flan-T5-XL: ~2-3GB VRAM (when loaded)
- Total: ~5-6GB VRAM (fits comfortably on 8GB GPUs)
- PDF Upload & Processing: 10-30 seconds
- First Query (model loading): 30-60 seconds
- Subsequent Queries: 3-10 seconds
- Summarization: 20-60 seconds (depending on length)
See SETUP_GUIDE.md for detailed troubleshooting steps.
FIXED! The application now uses 4-bit quantization by default, reducing memory from ~7GB to ~2-3GB.
For detailed memory optimization guide, see MEMORY_OPTIMIZATION_GUIDE.md
Quick fixes:
- Automatic: Just restart the backend - memory optimizations are now enabled
- Still having issues?: Check GPU processes with
nvidia-smi - Last resort: Use CPU mode (slower but works):
# Edit backend/utils/config.py
DEVICE = "cpu" # Fallback to CPUrm -rf models/ # Clear cache
# Models will re-download on next use- Verify backend is running: http://localhost:8000/docs
- Check frontend API URL in
website/client/lib/api.ts
This is an educational project. Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Use this architecture for learning
- ✅ 100% offline - no external API calls
- ✅ All data stays on your machine
- ✅ No telemetry or tracking
- ✅ Documents never leave your device
- ✅ No API keys required
MIT License - See LICENSE file for details
- Documentation: claude.md
- Setup Guide: SETUP_GUIDE.md
- Memory Optimization: MEMORY_OPTIMIZATION_GUIDE.md
- Backend README: backend/README.md
- API Docs (when running): http://localhost:8000/docs
Version: 1.0.0 Last Updated: 2025-11-06 Status: Active Development Maintainer: Educational Project
Made with ❤️ for students who want to own their learning tools