Skip to content

Curated list of tools, frameworks, and resources for running, building, and deploying AI privately — on-prem, air-gapped, or self-hosted.

License

Notifications You must be signed in to change notification settings

tdi/awesome-private-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Private AI Awesome

Curated list of tools, frameworks, and resources for running, building, and deploying AI privately — on-prem, air-gapped, or self-hosted.

Private AI enables you to keep your data, models, and infrastructure under your control, avoiding unnecessary exposure to third parties. This list covers inference runtimes, model management, privacy tools, and more.

Contents

Inference Runtimes & Backends

Engines and frameworks to run LLMs, vision, and multimodal models locally.

  • vLLM - High-throughput, low-latency inference engine for LLMs.
  • whisper.cpp - C++ port of OpenAI's Whisper automatic speech recognition model, optimized for local, CPU/GPU inference without internet connectivity.
  • mlx-lm - Fast, Apple Silicon-optimized LLM inference engine for running models locally and privately.
  • Jan - Privacy-first, offline AI assistant and LLM runtime for local, secure inference.
  • LM Studio - Cross-platform desktop app for running local LLMs with an easy-to-use interface.
  • Cherry Studio - Powerful and customizable cross-platform desktop app for LLM inference with built in web search, RAG, MCP support, and a quick assistant hotkey to summon your LLM from anywhere. Supports a wide variety of providers and OpenAI compatible endpoints for local inference.
  • LLM-D - Privacy-first, distributed LLM inference engine for scalable, local deployments.
  • Ollama - Local LLM runner with model packaging. Uses llama.cpp backend to serve cautious model defaults.
  • llama.cpp - Portable, CPU/GPU-friendly LLM inference, good for GPU + CPU hybrid inference.
  • ik_llama.cpp - Fork of llama.cpp with bleeding edge feature implementations and quantization improvements.
  • text-generation-inference - Optimized serving stack from Hugging Face.
  • GPT4All - Local desktop model runner.
  • exo - Run your own AI cluster at home with everyday devices. Dynamic model partitioning across multiple devices like iPhones, Macs, and Linux machines.
  • exllama3 - An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs. Use TabbyAPI for an API server.
  • tabbyAPI - Official API server for running exllamav2 and exllamav3 models. Aims to be a friendly backend with high customizablity and an idiotmatic OAI compatible API for users.
  • YALS (Yet another llamacpp server) - TabbyAPI's sister project, adapted for llama.cpp and GGUF models. Built from the ground up using libllama instead of wrapping llama-server.
  • llama-swap - Model swapping for llama.cpp (or any local OpenAPI compatible server).

Model Management & Serving

Tools for hosting, scaling, and versioning AI models privately.

  • Ray Serve - Scalable Python model serving.
  • Seldon Core - Kubernetes-native model deployment.
  • KServe - Serverless model inference on Kubernetes.
  • BentoML - Model packaging & serving framework.
  • vLLM Production Stack - End-to-end stack for deploying vLLM in production, including orchestration, monitoring, autoscaling, and best practices for private LLM serving.
  • OME (Open Model Engine) - Unified, open-source engine for serving, managing, and scaling LLMs and multimodal models privately. Supports sglang, vLLM, and more.

Fine-Tuning & Adapters

Private workflows for adapting models to your needs.

  • LoRA - Low-rank adaptation technique.
  • PEFT - Parameter-efficient fine-tuning.
  • QLoRA - Memory-efficient LoRA on quantized models.

Vector Databases & Embeddings

Private semantic search & retrieval-augmented generation.

  • Milvus - Scalable vector database.
  • Qdrant - High-performance Vector Database and Vector Search Engine.
  • Weaviate - Open-source semantic search engine.
  • Chroma - Local-first vector database.
  • FAISS - Facebook AI Similarity Search.

Agents & Orchestration

Frameworks for chaining private AI tools & agents.

  • AG2 - Open-source operating system for agentic AI with native Ollama support for local model deployment and multi-agent collaboration.

  • LangChain - Agent and LLM orchestration framework.

  • Langflow - Visual workflow builder for creating and deploying AI-powered agents and workflows with built-in API servers.

  • Haystack - End-to-end RAG pipelines.

  • Flowise - No-code LangChain UI.

  • LlamaIndex - Data framework for LLM apps.

  • Trae Agent - Privacy-friendly agent framework for orchestrating LLMs and tools, designed for secure, local, and scalable AI workflows.

  • Qwen-Agent - Open-source, privacy-friendly agent framework for orchestrating LLMs and tools, designed for secure, local, and scalable AI workflows.

  • Crush - Privacy-first, open-source agentic coding and automation platform for local AI workflows.

  • OpenCode AI - Open-source agentic coding platform for private, local, and secure AI-powered development workflows.

  • PydanticAI - Python agent framework by the Pydantic team, model-agnostic with Ollama support for local deployment.

  • sglang - Fast, privacy-first LLM inference and programming language for building composable, local AI workflows.

  • dspy - Modular, open-source agent framework for building composable, private LLM applications and workflows.

  • CUA - enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.

  • Bytebot - A desktop agent is an AI that has its own computer. Unlike browser-only agents or traditional RPA tools, Bytebot comes with a full virtual desktop.

VS Code Plugins & Extensions

Privacy-first, open-source agentic coding plugins and extensions for VS Code and other editors.

  • Roo Code - Privacy-first, open-source agentic coding platform for secure, local AI development (VS Code extension).
  • cline - Privacy-first, open-source agentic coding platform for local AI workflows and automation (VS Code extension).

Privacy, Security & Governance

Keep AI deployments secure and compliant.

  • BlindAI - Confidential AI inference using TEEs.
  • OpenFL - Federated learning framework.
  • Flower - Federated learning at scale.
  • Concrete - Fully homomorphic encryption for AI.

Models for Private Deployment

Open-weight models and model libraries you can self-host.

  • LLaMA 3 - Meta’s open-weight language model.
  • Mistral 7B - Dense 7B parameter model.
  • Qwen 3 - A wide variety of general and specialized models in both dense and "Mixture of Experts" formats.
  • Kimi K2 - Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters.
  • Phi-4 - Small, high-quality models from Microsoft.
  • Mixtral - Mixture-of-experts model.
  • Falcon - Open-source model from TII.
  • Gemma3 - Open source model from Google.
  • MLX Community - Community-driven Hugging Face page for open MLX models, optimized for Apple Silicon and private deployment.
  • Bielik - An open source project that provides data, tools and LLMs for the development of the Polish artificial intelligence landscape

UI & Interaction Layers

Self-hosted chat & AI frontends.

  • Chatbot UI - Open-source ChatGPT clone.
  • LibreChat - Enhanced web UI for LLMs.
  • AnythingLLM - Full-stack private LLM workspace.
  • Open WebUI - Commonly recommended Web UI frontend which features built in search, web scrape, RAG, and optional user authentication.

Datasets & Data Prep

Create and manage private training corpora.

  • OpenWebText - Open dataset similar to GPT training data.
  • RedPajama - Open LLM training dataset.
  • Datamixers - Privacy-focused data preprocessing tools.

Learning Resources & Research

Guides, papers, and tutorials on private AI.

#TODO

AI Routers & API Aggregators

Centralized routers and proxy layers for aggregating, governing, and securing your private AI stack. These tools simplify connections to multiple model servers, optimize LLM routing, and provide observability, security, and compliance.

  • Nexus - Open-source AI router to aggregate Model Context Protocol (MCP) servers, intelligently route requests to the best LLMs, and provide security, governance, observability, and simplified architecture for private AI deployments. Blog

Contributing

Contributions welcome! See Contributing

License

Under CC0-1.0 license. see LICENSE

About

Curated list of tools, frameworks, and resources for running, building, and deploying AI privately — on-prem, air-gapped, or self-hosted.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published