¹Shanghai AI Laboratory, ²Shanghai Innovation Institute, ³The Chinese University of Hong Kong,
⁴Shanghai Jiao Tong University, ⁵Zhejiang University of Technology
We introduce a stand-alone, decoder-only autoregressive model, trained from scratch, that unifies a broad spectrum of image generation tasks, including text-to-image generation, image pair generation, subject-driven generation, multi-turn image editing, controllable generation, and dense prediction.
User Demo
user_demo.MP4
[2025-08-02] 🎉🎉🎉 We released the inference code for image-to-image tasks and the all-in-one model checkpoints on HuggingFace.
[2025-07-25] 🎉🎉🎉 We released the technical report on arXiv.
[2025-04-03] 🎉🎉🎉 Lumina-mGPT 2.0 is released!
- Text-to-Image / Image Pair Generation Inference & Checkpoints
- Finetuning code
- Technical Report
- All-in-One Inference & Checkpoints
git clone https://github.com/Alpha-VLLM/Lumina-mGPT-2.0.git && cd Lumina-mGPT-2.0
conda create -n lumina_mgpt_2 python=3.10 -y
conda activate lumina_mgpt_2
pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation
pip install -e .
Kindly find proper flash-attn version from this link.
Download MoVQGAN weights and put them to the lumina_mgpt/movqgan/270M/movqgan_270M.ckpt
.
mkdir -p lumina_mgpt/movqgan/270M
wget -O lumina_mgpt/movqgan/270M/movqgan_270M.ckpt https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_270M.ckpt
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 768 --height 768
Provide two acceleration strategies: Speculative Jacobi Decoding (--speculative_jacobi
) and Model Quantization (--quant
).
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 768 --height 768 \
--speculative_jacobi --quant
We provide the inference time and GPU memory on one A100 as a reference:
Method | Inference Time | Inference GPU Memory | Description |
---|---|---|---|
Lumina-mGPT 2.0 | 694s | 80 GB | ✅ Recommend |
+ speculative_jacobi | 324s | 79.2 GB | ✅ Recommend |
+ speculative_jacobi & quant | 304s | 33.8 GB |
You can refer to sample_i2i.sh. We DO NOT recommand speculative_jacobi for image-to-image inference.
# controllable generation
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 512 --height 1024 --task i2i \
--i2i_task depth --image_path "assets/depth.png" \
--image_prompt "A rubber outdoor basketball. On a sunlit outdoor court, it bounces near a vibrant mural, casting a long shadow on the asphalt as children eagerly chase it."
# subject driven generation
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 512 --height 1024 --task i2i \
--i2i_task subject --image_path "assets/subject.png" \
--image_prompt "On a bustling city rooftop at sunset, this item gleams in a tall glass as the skyline silhouettes in the background, the air filled with laughter and clinking glasses."
Please refer to TRAIN.md
Model | Size | Resolution | pth link | Description |
---|---|---|---|---|
Lumina-mGPT 2.0 | 7B | 768px | 7B | Text-to-Image |
Lumina-mGPT 2.0 (Omni) | 7B | 768px & 512px (image-to-image) | 7B-Omni | All-in-One |
Thanks to the following open-sourced codebase for their wonderful work and codebase!
- Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
- Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
- Chameleon: Mixed-Modal Early-Fusion Foundation Models
We are hiring interns and full-time researchers at the Alpha VLLM Group, Shanghai AI Lab. If you are interested, please contact [email protected].
@article{xin2025lumina,
title={Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling},
author={Xin, Yi and Yan, Juncheng and Qin, Qi and Li, Zhen and Liu, Dongyang and Li, Shicheng and Huang, Victor Shea-Jay and Zhou, Yupeng and Zhang, Renrui and Zhuo, Le and others},
journal={arXiv preprint arXiv:2507.17801},
year={2025}
}