Skip to content
/ T2I-R1 Public

Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

License

Notifications You must be signed in to change notification settings

CaraJ7/T2I-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟🔥 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Official repository for the paper "T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT".

[📖 Paper] [🤗 Model]


💥 News

👀 Reasoning in Image Generation

Chain-of-Thought (CoT) reasoning with reinforcement learning (RL) has been extensively explored by LLMs and LMMs. However, applying such reasoning strategies to the visual generation domain remains largely unexplored. In this project, we provide T2I-R1, a novel reasoning-enhanced text-to-image generation model powered by RL with a bi-level CoT reasoning process.


We identify two levels of CoT that can be utilized to enhance different stages of generation:

  1. 🧠 Semantic-level CoT is the textual reasoning about the image to generate, which is introduced prior to the image generation. The semantic-level CoT designs the global structure of the image, e.g., the appearance and location of each object. Optimizing the semantic-level CoT could explicitly manage the planning and reasoning of the prompt before the subsequent image tokens generation, making the generation easier.

  2. 🎨 Token-level CoT is the intermediate patch-by-patch generation process of the image. Unlike semantic-level CoT, token-level CoT focuses on low-level details like pixel generation and maintaining visual coherence between adjacent patches. Optimizing the token-level CoT can enhance both the generation quality and the alignment between the prompt and the resulting images.

To better coordinate these two levels of CoT, we introduce BiCoT-GRPO with an ensemble of generation rewards, which seamlessly optimizes both generation CoTs within the same training step:


🗒️ TODO

  • Release ORM Checkpoint and reward code

  • Release Checkpoint

💪 Get Started

Installation

Clone the repository:

git clone https://github.com/CaraJ7/T2I-R1.git
cd T2I-R1

Create a conda environment:

conda create -n t2i-r1 python=3.10
conda activate t2i-r1

Please follow the official instructions here to install both PyTorch and TorchVision dependencies.

Install additional dependencies:

cd src
pip install -r requirements.txt

Note that other newer versions of torch, transformers, and trl may also work.

Set up the Reward Model Environment

Make sure to install from our repo. We make some necessary modifications to train with Zero3.

Install GrouningDINO if you want to use Object Detector reward

cd t2i-r1/src/t2i-r1/src/utils/GroundingDINO
pip install -e .

Install LLaVA if you want to use ORM reward

cd t2i-r1/src/t2i-r1/src/utils/LLaVA-NeXT
pip install -e ".[train]"

Prepare Reward Model Checkpoints

Please download the reward model you need for training.

cd t2i-r1
mkdir reward_weight
cd reward_weight
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt
huggingface-cli download microsoft/git-large-vqav2 --repo-type model --local-dir git-large-vqav2
  • Download GroundingDINO checkpoint from this link by
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
huggingface-cli download CaraJ/ORM-T2I-R1 --repo-type model --local-dir ORM-T2I-R1

🚀 Training

cd t2i-r1/src
bash scripts/run_grpo.sh

Notes:

  • Parameters:
    • reward_funcs: The options are hps, git, gdino, orm. You can choose whatever composition you need for training. Make sure to substitute the correct checkpoint path and config path in the run_grpo.sh

💫 Inference

You can download the checkpoint from here or train the model by yourself.

cd t2i-r1/src/infer
python reason_inference.py \
--model_path YOUR_MODEL_CKPT \
--data_path test_data.txt 

📒 Notes

  • When necessary, we incorporate the corresponding repo from the reward model we use. We modify certain code to adapt for Zero3 training and delete unused folders to maintain a lightweight codebase.
    • For GroundingDINO, we modify the code in t2i-r1/src/t2i-r1/src/utils/GroundingDINO/groundingdino/models/GroundingDINO/groundingdino.py.
    • For LLaVA (ORM), we modify the code in t2i-r1/src/t2i-r1/src/utils/LLaVA-NeXT/llava/model/builder.py and t2i-r1/src/t2i-r1/src/utils/LLaVA-NeXT/llava/model/llava_arch.py.

🧠 Related Work

Explore our additional research on Autoregressive Text-to-Image Generation and CoT Reasoning

🥳 Acknowledgements

We would like to thank R1-V and Image Generation CoT, upon which our repo is built.

Lincense

This project is released under Apache License 2.0. We release our checkpoints for research purposes only. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it in a responsible manner. The developers do not assume any responsibility for potential misuse by users.

📄 Cite

@article{jiang2025t2i,
  title={T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT},
  author={Jiang, Dongzhi and Guo, Ziyu and Zhang, Renrui and Zong, Zhuofan and Li, Hao and Zhuo, Le and Yan, Shilin and Heng, Pheng-Ann and Li, Hongsheng},
  journal={arXiv preprint arXiv:2505.00703},
  year={2025}
}

About

Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages