torch.compile can't be used with groupoffloading on hunyuanvideo_frampack

### Describe the bug

when i use block_level groupoffloading with use_stream=True on hunyuanvideo_framepack, i want to use torch compile to accelerate video generation. but it seems apply compile and groupoffloading at the same time won't accelerate the model forward.

### Reproduction

i'm using the code below on rtx4090. one latent cost about 3 minutes. and even i add TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+dynamo", there is no compilation log. i'm using the torch_tensorrt backend, and the result is the same when using inductor backend
```
import torch
from diffusers import HunyuanVideoFramepackPipeline, HunyuanVideoFramepackTransformer3DModel
from diffusers.utils import export_to_video, load_image
from transformers import SiglipImageProcessor, SiglipVisionModel
transformer = HunyuanVideoFramepackTransformer3DModel.from_pretrained(
    framepack_transformer_path, torch_dtype=torch.bfloat16
)
feature_extractor = SiglipImageProcessor.from_pretrained(
    framepack_redux_bfl, subfolder="feature_extractor"
)
image_encoder = SiglipVisionModel.from_pretrained(
    framepack_redux_bfl, subfolder="image_encoder", torch_dtype=torch.float16
)
pipe = HunyuanVideoFramepackPipeline.from_pretrained(
    hunyuan_model_path,
    transformer=transformer,
    feature_extractor=feature_extractor,
    image_encoder=image_encoder,
    torch_dtype=torch.float16,
)

for block in pipe.transformer.transformer_blocks:
    block = torch.compile(block,backend="torch_tensorrt",dynamic=False)

for block in pipe.transformer.single_transformer_blocks:
    block = torch.compile(block,backend="torch_tensorrt",dynamic=False)

onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
list(map(
    lambda x: apply_group_offloading(x, onload_device, offload_device, offload_type="block_level",num_blocks_per_group=1,use_stream=True,non_blocking=True),
    [pipe.text_encoder, pipe.text_encoder_2, pipe.transformer]
))
pipe.image_encoder.to(onload_device)
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()
pipe.vae.to(onload_device)

image_path = "/media/242hdd/image.jpg"
image = cv2.imread(image_path)
height, width, _ = image.shape
output = pipe(
    image=image,
    prompt="A girl",
    height=height,
    width=width,
    num_frames=161,
    num_inference_steps=25,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(0),
    sampling_type="vanilla",
).frames[0]

export_to_video(output, "output.mp4", fps=30)
```


### Logs

```shell

```

### System Info

 ```
- 🤗 Diffusers version: 0.34.0.dev0
- Platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.31
- Running on Google Colab?: No
- Python version: 3.11.11
- PyTorch version (GPU?): 2.8.0.dev20250505+cu128 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.50.2
- Accelerate version: 1.5.2
- PEFT version: 0.15.1
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
```

### Who can help?

@a-r-r-o-w 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.compile can't be used with groupoffloading on hunyuanvideo_frampack #11584

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.compile can't be used with groupoffloading on hunyuanvideo_frampack #11584

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions