Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

¹Shanghai AI Laboratory, ²Shanghai Innovation Institute, ³The Chinese University of Hong Kong,

⁴Shanghai Jiao Tong University, ⁵Zhejiang University of Technology

📚 Introduction

We introduce a stand-alone, decoder-only autoregressive model, trained from scratch, that unifies a broad spectrum of image generation tasks, including text-to-image generation, image pair generation, subject-driven generation, multi-turn image editing, controllable generation, and dense prediction.

User Demo

user_demo.MP4

Architecture

🔥 News

[2025-08-02] 🎉🎉🎉 We released the inference code for image-to-image tasks and the all-in-one model checkpoints on HuggingFace.

[2025-07-25] 🎉🎉🎉 We released the technical report on arXiv.

[2025-04-03] 🎉🎉🎉 Lumina-mGPT 2.0 is released!

📝 Open-source Plan

Text-to-Image / Image Pair Generation Inference & Checkpoints
Finetuning code
Technical Report
All-in-One Inference & Checkpoints

📽️ Demo Examples

Qualitative Performance

Comparison with Lumina-mGPT and Janus Pro

🚀 Quick Start

⚙️ Installation

1. Create a conda environment

git clone https://github.com/Alpha-VLLM/Lumina-mGPT-2.0.git && cd Lumina-mGPT-2.0
conda create -n lumina_mgpt_2 python=3.10 -y
conda activate lumina_mgpt_2

2. Install dependencies

pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation
pip install -e .

Kindly find proper flash-attn version from this link.

3. Download MoVQGAN

Download MoVQGAN weights and put them to the lumina_mgpt/movqgan/270M/movqgan_270M.ckpt.

mkdir -p lumina_mgpt/movqgan/270M
wget -O lumina_mgpt/movqgan/270M/movqgan_270M.ckpt https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_270M.ckpt

⛽ Text-to-Image Generation

1. Simple Inference

python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 768 --height 768

2. Accelerate Inference

Provide two acceleration strategies: Speculative Jacobi Decoding (--speculative_jacobi) and Model Quantization (--quant).

python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 768 --height 768 \
--speculative_jacobi --quant

We provide the inference time and GPU memory on one A100 as a reference:

Method	Inference Time	Inference GPU Memory	Description
Lumina-mGPT 2.0	694s	80 GB	✅ Recommend
+ speculative_jacobi	324s	79.2 GB	✅ Recommend
+ speculative_jacobi & quant	304s	33.8 GB

🌟 Image-to-Image Inference

You can refer to sample_i2i.sh. We DO NOT recommand speculative_jacobi for image-to-image inference.

# controllable generation
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 512 --height 1024 --task i2i \
--i2i_task depth --image_path "assets/depth.png" \
--image_prompt "A rubber outdoor basketball. On a sunlit outdoor court, it bounces near a vibrant mural, casting a long shadow on the asphalt as children eagerly chase it."

# subject driven generation
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 512 --height 1024 --task i2i \
--i2i_task subject --image_path "assets/subject.png" \
--image_prompt "On a bustling city rooftop at sunset, this item gleams in a tall glass as the skyline silhouettes in the background, the air filled with laughter and clinking glasses."

💻 Finetuning

Please refer to TRAIN.md

🤗 Checkpoints

Model	Size	Resolution	pth link	Description
Lumina-mGPT 2.0	7B	768px	7B	Text-to-Image
Lumina-mGPT 2.0 (Omni)	7B	768px & 512px (image-to-image)	7B-Omni	All-in-One

📜 Acknowledgements

Thanks to the following open-sourced codebase for their wonderful work and codebase!

🔥 Open Positions

We are hiring interns and full-time researchers at the Alpha VLLM Group, Shanghai AI Lab. If you are interested, please contact [email protected].

🌟 Star History

📖 BibTeX

@article{xin2025lumina,
  title={Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling},
  author={Xin, Yi and Yan, Juncheng and Qin, Qi and Li, Zhen and Liu, Dongyang and Li, Shicheng and Huang, Victor Shea-Jay and Zhou, Yupeng and Zhang, Renrui and Zhuo, Le and others},
  journal={arXiv preprint arXiv:2507.17801},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
lumina_mgpt		lumina_mgpt
xllmx		xllmx
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TRAIN.md		TRAIN.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

📚 Introduction

🔥 News

📝 Open-source Plan

📽️ Demo Examples

🚀 Quick Start

⚙️ Installation

1. Create a conda environment

2. Install dependencies

3. Download MoVQGAN

⛽ Text-to-Image Generation

1. Simple Inference

2. Accelerate Inference

🌟 Image-to-Image Inference

💻 Finetuning

🤗 Checkpoints

📜 Acknowledgements

🔥 Open Positions

🌟 Star History

📖 BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

Alpha-VLLM/Lumina-mGPT-2.0

Folders and files

Latest commit

History

Repository files navigation

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

📚 Introduction

🔥 News

📝 Open-source Plan

📽️ Demo Examples

🚀 Quick Start

⚙️ Installation

1. Create a conda environment

2. Install dependencies

3. Download MoVQGAN

⛽ Text-to-Image Generation

1. Simple Inference

2. Accelerate Inference

🌟 Image-to-Image Inference

💻 Finetuning

🤗 Checkpoints

📜 Acknowledgements

🔥 Open Positions

🌟 Star History

📖 BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages