🚀 Open WebUI with Local AI Models

Run ChatGPT-quality AI models 100% locally on your home server - no API keys, no cloud costs, no data leaving your network. This Docker Compose setup gives you a production-ready AI stack with a beautiful web interface.

✨ What You Get

🤖 GPT-OSS Models - OpenAI's powerful open-source language models (20B & 120B parameters)
🎨 Flux Image Generation - State-of-the-art text-to-image with ComfyUI
📄 Document Intelligence - Extract and chat with PDFs, images, and documents via Apache Tika
💬 Beautiful Chat Interface - Open WebUI with dark mode, conversation history, and model switching
🔒 100% Private - Everything runs locally, your data never leaves your server
⚡ Hardware Flexible - Modular configuration adapts to your GPU setup

Overview

This repository provides a plug-and-play Docker Compose configuration designed for home servers with NVIDIA GPUs. All services are pre-configured but commented out by default - simply uncomment what your hardware supports and deploy!

Hardware Requirements

Minimum Requirements

NVIDIA GPU with at least 16GB VRAM
32GB system RAM
Docker with NVIDIA Container Toolkit

Model VRAM Requirements

GPT-OSS-20B: ~16GB VRAM (1x A6000 or similar)
GPT-OSS-120B: ~96GB VRAM (2x A6000)
Llama-3.2-11B-Vision: ~24GB VRAM
ComfyUI Flux: ~32GB VRAM

📋 Example Deployments

Single A6000 (48GB)

Enable: Open WebUI, Tika, GPT-OSS-20B
Result: Text generation with 20B model

Dual A6000 (96GB) - Option 1

Enable: Open WebUI, Tika, GPT-OSS-120B
Result: Advanced text generation with 120B model

Dual A6000 (96GB) - Option 2

Enable: Open WebUI, Tika, GPT-OSS-20B, ComfyUI
GPU Assignment: GPU 0 for ComfyUI, GPU 2 for GPT-OSS-20B
Result: Full text + image generation stack

Quick Start

Clone the repository
```
git clone <repository>
cd run_openwebui
```
Create .env file for Hugging Face token
```
echo "HUGGING_TOKEN2=your_hf_token_here" > .env
```
📝 Get your token from huggingface.co/settings/tokens - required for ComfyUI model downloads
Configure your models

Edit docker-compose.yaml and uncomment the services your hardware supports based on the example deployments above

Set GPU visibility

Adjust CUDA_VISIBLE_DEVICES in each service to match your GPU IDs:

environment:
  CUDA_VISIBLE_DEVICES: "0"  # Single GPU
  # CUDA_VISIBLE_DEVICES: "0,1"  # Multi-GPU

Create required directories

sudo mkdir -p /mnt/nas/hf_vllm_models
sudo mkdir -p /mnt/nas/ollama_webui/webui
sudo mkdir -p /mnt/nas/hf_comfyui_models

Start services
```
docker compose up -d
```
Access the interfaces
- Open WebUI: http://localhost:8010
- ComfyUI: http://localhost:8188 (if enabled)

Services

Core Services (Always Enabled)

Open WebUI (port 8010): Chat interface for all models
Apache Tika (port 8009): Document/PDF extraction and OCR

Optional Model Services (Uncomment as Needed)

Text Generation

vLLM GPT-OSS-20B (port 8012): 20B parameter model for general text
vLLM GPT-OSS-120B (port 8011): 120B parameter model for advanced tasks
vLLM Llama-3.2-11B-Vision (port 8015): Multimodal vision/text model

Image Generation

ComfyUI (port 8188): Flux1-dev image generation

Fallback Models

Ollama (port 11434): For models not compatible with vLLM

Configuration Tips

Selecting Models

Models are commented out to prevent deployment failures on systems with insufficient VRAM. Uncomment only the services your GPU(s) can handle:

# Uncomment the services you want to enable:
services:
  # vllm-gptoss-120b:  # Requires 2x A6000
  vllm-gptoss-20b:     # Requires 1x A6000  
  # comfyui:           # Requires 32GB+ VRAM

GPU Assignment

When running multiple services, assign different GPUs to avoid conflicts:

# Service 1 on GPU 0
CUDA_VISIBLE_DEVICES: "0"

# Service 2 on GPU 1  
CUDA_VISIBLE_DEVICES: "1"

# Service using multiple GPUs
CUDA_VISIBLE_DEVICES: "0,2"

Memory Optimization

For limited VRAM, adjust these parameters:

--gpu-memory-utilization: Reduce from 0.95 to 0.90
--max-num-seqs: Reduce concurrent requests
LOW_VRAM=true: Enable for ComfyUI on <16GB cards

Troubleshooting

Out of Memory: Reduce --max-model-len or disable concurrent services
Models not loading: Check volume mounts and ensure model files are downloaded
Connection refused: Verify services are on the same Docker network (oi_net)
ComfyUI model download fails: Ensure your Hugging Face token is set correctly in .env

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
prompt-template-img.txt		prompt-template-img.txt
workflow-flux-dev.json		workflow-flux-dev.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Open WebUI with Local AI Models

✨ What You Get

Overview

Hardware Requirements

Minimum Requirements

Model VRAM Requirements

📋 Example Deployments

Single A6000 (48GB)

Dual A6000 (96GB) - Option 1

Dual A6000 (96GB) - Option 2

Quick Start

Services

Core Services (Always Enabled)

Optional Model Services (Uncomment as Needed)

Text Generation

Image Generation

Fallback Models

Configuration Tips

Selecting Models

GPU Assignment

Memory Optimization

Troubleshooting

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

davidamacey/run_openwebui

Folders and files

Latest commit

History

Repository files navigation

🚀 Open WebUI with Local AI Models

✨ What You Get

Overview

Hardware Requirements

Minimum Requirements

Model VRAM Requirements

📋 Example Deployments

Single A6000 (48GB)

Dual A6000 (96GB) - Option 1

Dual A6000 (96GB) - Option 2

Quick Start

Services

Core Services (Always Enabled)

Optional Model Services (Uncomment as Needed)

Text Generation

Image Generation

Fallback Models

Configuration Tips

Selecting Models

GPU Assignment

Memory Optimization

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages