OCR Text Vision Pro

AI-powered OCR application using Llama 3.2 Vision for advanced image understanding and text extraction

✨ Features

📈 General OCR & Content Recognition

Text Extraction: Extract readable content from any image
LaTeX Conversion: Convert mathematical equations to LaTeX code with live rendering
Code Extraction: Extract and format code snippets from screenshots
Chart Analysis: Describe charts, diagrams, and visual data

📑 Advanced Document Intelligence

Document VQA: Ask specific questions about document content
Structured Extraction: Extract invoice numbers, dates, amounts, etc.
Form Processing: Handle contracts, receipts, and business documents

❓ Intelligent Visual Question Answering

Scene Understanding: Analyze and describe image content
Object Recognition: Identify and reason about objects in images
Visual Reasoning: Answer complex questions about visual content

🗣️ Multi-modal Chat Assistant

Interactive Conversations: Chat with AI about uploaded images
Context Awareness: Maintains conversation history
Real-time Responses: Instant AI-powered image analysis

🔁 Workflow

flowchart TB
 subgraph subGraph0["Streamlit App"]
        UI["Presentation Layer (Streamlit UI)"]
        PIL["Image Preprocessor (PIL)"]
        APIClient["API Client (requests)"]
        Session["Session Manager<br>(in-memory API key &amp; chat history)"]
  end
    Browser["User’s Web Browser"] -- UI event / upload image --> UI
    UI -- image --> PIL
    PIL -- processed image --> APIClient
    UI -- store API key & history --> Session
    Session -- provide API key --> APIClient
    APIClient -- HTTP POST --> External["OpenRouter API<br>(Llama 3.2 Vision)"]
    External -- JSON response --> APIClient
    APIClient -- parsed results --> UI
    Deployment["Deployment Environment<br>(Streamlit Cloud or Docker)"] -- hosts --> UI

     UI:::frontend
     PIL:::app
     APIClient:::app
     Session:::app
     Browser:::frontend
     External:::external
     Deployment:::deployment
    classDef frontend fill:#D6EAF8,stroke:#1B4F72
    classDef app fill:#D5F5E3,stroke:#145A32
    classDef external fill:#FAD7A0,stroke:#B9770E
    classDef deployment fill:#E5E7E9,stroke:#566573
    style Browser color:#000000
    style External color:#000000
    style Deployment color:#000000
    click UI "https://github.com/bcastelino/ocr-text-vision-pro/blob/main/ocr_app.py"
    click PIL "https://github.com/bcastelino/ocr-text-vision-pro/blob/main/ocr_app.py"
    click APIClient "https://github.com/bcastelino/ocr-text-vision-pro/blob/main/ocr_app.py"
    click Session "https://github.com/bcastelino/ocr-text-vision-pro/blob/main/ocr_app.py"

🚀 Quick Start

Option 1: Streamlit Community Cloud (Recommended)

Deploy directly - No setup required!
Enter your OpenRouter API key (free)
Start uploading images and extracting text!

Option 2: Local Development

# Clone the repository
git clone https://github.com/bcastelino/ocr-text-vision-pro.git
cd ocr-text-vision-pro

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run ocr_app.py

Option 3: Docker

# Build and run with Docker
docker build -t ocr-text-vision-pro .
docker run -p 8501:8501 ocr-text-vision-pro

🔧 Tech Stack

Frontend: Streamlit (Python web framework)
AI Model: Llama 3.2-11B Vision (via OpenRouter API)
Image Processing: PIL/Pillow
HTTP Client: Requests
Deployment: Streamlit Community Cloud

📋 Requirements

Python 3.11+
OpenRouter API key (free tier available)
Modern web browser

🎯 Use Cases

Students: Extract text from lecture slides and handwritten notes
Researchers: Convert mathematical equations to LaTeX
Developers: Extract code from screenshots and documentation
Business: Process invoices, receipts, and contracts
Content Creators: Analyze charts and extract data for reports

🔐 Privacy & Security

API keys are stored only in session (not persisted)
No image data is stored on my server
All processing happens through the secure OpenRouter API
Runs entirely in your browser session

🤝 Contributing

Welcome contributors! Please feel free to submit issues and enhancement requests.

📄 License

This project is open source and available under the MIT License.

🐱‍👤 Author

Brian Denis Castelino

Data Analytics Engineer | AI Enthusiast

I turn vague ideas into clean, working systems, because someone’s got to 🤖

Made with ❤️ using Streamlit and Llama 3.2 Vision

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ocr_app.py		ocr_app.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR Text Vision Pro

✨ Features

📈 General OCR & Content Recognition

📑 Advanced Document Intelligence

❓ Intelligent Visual Question Answering

🗣️ Multi-modal Chat Assistant

🔁 Workflow

🚀 Quick Start

Option 1: Streamlit Community Cloud (Recommended)

Option 2: Local Development

Option 3: Docker

🔧 Tech Stack

📋 Requirements

🎯 Use Cases

🔐 Privacy & Security

🤝 Contributing

📄 License

🐱‍👤 Author

Brian Denis Castelino

About

Uh oh!

Releases

Packages

Languages

License

bcastelino/ocr-text-vision-pro

Folders and files

Latest commit

History

Repository files navigation

OCR Text Vision Pro

✨ Features

📈 General OCR & Content Recognition

📑 Advanced Document Intelligence

❓ Intelligent Visual Question Answering

🗣️ Multi-modal Chat Assistant

🔁 Workflow

🚀 Quick Start

Option 1: Streamlit Community Cloud (Recommended)

Option 2: Local Development

Option 3: Docker

🔧 Tech Stack

📋 Requirements

🎯 Use Cases

🔐 Privacy & Security

🤝 Contributing

📄 License

🐱‍👤 Author

Brian Denis Castelino

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages