LMIgnite sets your cluster on fire with LLM deployments. It's your one-click solution for deploying high-performance, enterprise-grade LLM serving infrastructure to your own cluster and cloud environments.
LMIgnite is:
- Self-hosted: You run your LLM with your own machines (real or virtual). It's cheap and private.
- High-performance: We give you the best performance by deeply integrating open-source LLM projects, including inference engine (vLLM
), inter-inference-engine communication (LMCache
) and production-level orchestration (vLLM production stack
).
Feature highlights:
- 🌐 Easy-to-use: Deploy LLMs right from your browser.
- 🚀 One-click runnable: Run the bash script, and the deployment webpage pops up for you.
- ⚡ 3-10x faster response times with our own open-source projects (LMCache
and vLLM production stack
).
- 🏢 Enterprise-ready: Multi-tenancy, autoscaling, and high availability.
- 🔧 Wide support: AWS, GCP, Azure, Lambda, and on-premises.
- 📊 Built-in monitoring and performance analytics.
Environment | Status | Notes |
---|---|---|
Lambda Labs | ✅ Available | Fully supported and tested |
NEBIUS | 🚧 Next Release | Support coming in the next release |
Google GKE | 🚧 Next Release | Support coming in the next release |
AWS | 🚧 Planned | Planned for future support |
Azure | 🚧 Planned | Planned for future support |
RunPod | 🚧 Planned | Planned for future support |
FluidStack | 🚧 Planned | Planned for future support |
Paperspace | 🚧 Planned | Planned for future support |
Local Cluster | 🚧 Planned | Planned for future support |
LMIgnite runs a browser on your laptop to help you manage the cluster. Currently, we support macOS with a one-click script that automatically installs dependencies and launches the browser for you.
For users on Windows/Linux, you may need to install docker compose
manually before using the one-click script below.
Video tutorial: link.
For macOS users:
On your local laptop, open your terminal (press Command(⌘) + Space
, type terminal
(or your preferred terminal), and hit enter).
Then, copy and paste the following command into your terminal:
bash <(curl -fsSL https://raw.githubusercontent.com/LMCache/LMIgnite/refs/heads/main/install.command)
This script will guide you through installation and automatically open a browser for you to deploy LLMs on your own cloud!
For Linux/Windows users:
Make sure you have docker compose
installed. Then, open your terminal and run:
git clone https://github.com/LMCache/LMIgnite && cd LMIgnite && docker compose up
After that, wait for docker compose up
to launch the containers, then open your favorite browser and go to http://localhost:3001/.
Video tutorial: link.
LMIgnite currently supports two types of secrets:
- Lambda Cloud API key (for connecting to Lambda Cloud; support for other environments is coming soon)
- (Optional) Hugging Face access token (for accessing gated LLM models)
See the documentation for how to get these.
To add your secrets:
- In the left sidebar, click Secrets
- Select Lambda and click Add Secret to add your Lambda Cloud API key
- (Optional) Select Hugging Face and click Add Secret to add your Hugging Face token
Video tutorial: link.
- In the left sidebar, click Cluster, then hit + Create Cluster
- Fill in the Cluster Configuration:
- Name (e.g., test)
- Cloud Provider (e.g., Lambda Labs)
- Region (e.g., us-south-1)
- GPU Type & Count (e.g., 8 × H100)
- Click Create Cluster at the bottom right
- Wait until the status shows Active (Pending → init → wait_k8s → Active)
Video tutorial: link.
- In the left sidebar, click Deployments, then hit + Create Deployment
- Search or select from existing model cards (e.g., meta-llama/Llama-3.1-8B-Instruct)
- Configure basics:
- Deployment Name: give it a descriptive name (e.g., llama8b)
- Target Cluster: select one of your Active clusters
- (Optional) Hugging Face token: select one of the tokens you previously added in the Secrets section
- Click Create Deployment to quick-start, or Next: Advanced for fine control
- Monitor the deployment status progression
Video tutorial: link (at the end of this video).
We include two things that you can play with:
- An embedded chatbot interface that allows you to send the prompts and see the responses.
- An OpenAI-compatible API endpoint URL that you can use to send requests and receive responses in production.
Check our online documentation for detailed explanations!
- If port 3001 is in use, you can change it in
docker-compose.yml
. - If cluster creation fails, try switching to a different region.
- Some models on Hugging Face are "gated" and require access approval.
- Check deployment logs if creation fails.
We welcome contributions! Please check our documentation for development guidelines.
This project is licensed under the Apache License, Version 2.0.
See the LICENSE file for details.