Skip to content

Alpha-VLLM/Lumina-mGPT-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Lumina-mGPT 2.0  Lumina-mGPT 2.0  Lumina-mGPT 2.0

¹Shanghai AI Laboratory,   ²Shanghai Innovation Institute,   ³The Chinese University of Hong Kong,

⁴Shanghai Jiao Tong University,   ⁵Zhejiang University of Technology

📚 Introduction

We introduce a stand-alone, decoder-only autoregressive model, trained from scratch, that unifies a broad spectrum of image generation tasks, including text-to-image generation, image pair generation, subject-driven generation, multi-turn image editing, controllable generation, and dense prediction.

User Demo
user_demo.MP4
Architecture

🔥 News

[2025-08-02] 🎉🎉🎉 We released the inference code for image-to-image tasks and the all-in-one model checkpoints on HuggingFace.

[2025-07-25] 🎉🎉🎉 We released the technical report on arXiv.

[2025-04-03] 🎉🎉🎉 Lumina-mGPT 2.0 is released!

📝 Open-source Plan

  • Text-to-Image / Image Pair Generation Inference & Checkpoints
  • Finetuning code
  • Technical Report
  • All-in-One Inference & Checkpoints

📽️ Demo Examples

Qualitative Performance
Comparison with Lumina-mGPT and Janus Pro

🚀 Quick Start

⚙️ Installation

1. Create a conda environment

git clone https://github.com/Alpha-VLLM/Lumina-mGPT-2.0.git && cd Lumina-mGPT-2.0
conda create -n lumina_mgpt_2 python=3.10 -y
conda activate lumina_mgpt_2

2. Install dependencies

pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation
pip install -e .

Kindly find proper flash-attn version from this link.

3. Download MoVQGAN

Download MoVQGAN weights and put them to the lumina_mgpt/movqgan/270M/movqgan_270M.ckpt.

mkdir -p lumina_mgpt/movqgan/270M
wget -O lumina_mgpt/movqgan/270M/movqgan_270M.ckpt https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_270M.ckpt

⛽ Text-to-Image Generation

1. Simple Inference

python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 768 --height 768

2. Accelerate Inference

Provide two acceleration strategies: Speculative Jacobi Decoding (--speculative_jacobi) and Model Quantization (--quant).

python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 768 --height 768 \
--speculative_jacobi --quant

We provide the inference time and GPU memory on one A100 as a reference:

Method Inference Time Inference GPU Memory Description
Lumina-mGPT 2.0 694s 80 GB ✅ Recommend
+ speculative_jacobi 324s 79.2 GB ✅ Recommend
+ speculative_jacobi & quant 304s 33.8 GB

🌟 Image-to-Image Inference

You can refer to sample_i2i.sh. We DO NOT recommand speculative_jacobi for image-to-image inference.

# controllable generation
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 512 --height 1024 --task i2i \
--i2i_task depth --image_path "assets/depth.png" \
--image_prompt "A rubber outdoor basketball. On a sunlit outdoor court, it bounces near a vibrant mural, casting a long shadow on the asphalt as children eagerly chase it."

# subject driven generation
python generate_examples/generate.py \
--model_path Alpha-VLLM/Lumina-mGPT-2.0 --save_path save_samples/ \
--cfg 4.0 --top_k 4096 --temperature 1.0 --width 512 --height 1024 --task i2i \
--i2i_task subject --image_path "assets/subject.png" \
--image_prompt "On a bustling city rooftop at sunset, this item gleams in a tall glass as the skyline silhouettes in the background, the air filled with laughter and clinking glasses."

💻 Finetuning

Please refer to TRAIN.md

🤗 Checkpoints

Model Size Resolution pth link Description
Lumina-mGPT 2.0 7B 768px 7B Text-to-Image
Lumina-mGPT 2.0 (Omni) 7B 768px & 512px (image-to-image) 7B-Omni All-in-One

📜 Acknowledgements

Thanks to the following open-sourced codebase for their wonderful work and codebase!

🔥 Open Positions

We are hiring interns and full-time researchers at the Alpha VLLM Group, Shanghai AI Lab. If you are interested, please contact [email protected].

🌟 Star History

Star History Chart

📖 BibTeX

@article{xin2025lumina,
  title={Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling},
  author={Xin, Yi and Yan, Juncheng and Qin, Qi and Li, Zhen and Liu, Dongyang and Li, Shicheng and Huang, Victor Shea-Jay and Zhou, Yupeng and Zhang, Renrui and Zhuo, Le and others},
  journal={arXiv preprint arXiv:2507.17801},
  year={2025}
}

About

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •