TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lv*¹, Tianlin Pan*^2,3, Chenyang Si^2‡†, Zhaoxi Chen⁴, Wangmeng Zuo⁵, Ziwei Liu^4†, Kwan-Yee K. Wong^1†

¹The University of Hong Kong ²Nanjing University
³University of Chinese Academy of Sciences ⁴Nanyang Technological University
⁵Harbin Institute of Technology

(*Equal Contribution. ^‡Project Leader. ^†Corresponding Author.)

Paper | Project Page | LoRA Weights

About

We propose TACA, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.

teaser.mp4

Usage

For Stable Diffusion 3.5, simply run:

python infer/infer_sd3.py

For FLUX.1, run:

python infer/infer_flux.py

Benchmark

Comparison of alignment evaluation on T2I-CompBench for FLUX.1-Dev-based and SD3.5-Medium-based models.

Model	Attribute Binding			Object Relationship		Complex $\uparrow$
	Color $\uparrow$	Shape $\uparrow$	Texture $\uparrow$	Spatial $\uparrow$	Non-Spatial $\uparrow$
FLUX.1-Dev	0.7678	0.5064	0.6756	0.2066	0.3035	0.4359
FLUX.1-Dev + TACA ($r = 64$)	0.7843	0.5362	0.6872	0.2405	0.3041	0.4494
FLUX.1-Dev + TACA ($r = 16$)	0.7842	0.5347	0.6814	0.2321	0.3046	0.4479
SD3.5-Medium	0.7890	0.5770	0.7328	0.2087	0.3104	0.4441
SD3.5-Medium + TACA ($r = 64$)	0.8074	0.5938	0.7522	0.2678	0.3106	0.4470
SD3.5-Medium + TACA ($r = 16$)	0.7984	0.5834	0.7467	0.2374	0.3111	0.4505

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
infer		infer
static		static
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

About

Usage

Benchmark

Showcases

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Vchitect/TACA

Folders and files

Latest commit

History

Repository files navigation

TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

About

Usage

Benchmark

Showcases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages