A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction

[2025/1/1] Added training codes and inference codes. [2025/1/5] Added a demo for evaluating generated videos. [2025/1/7] Released our 120 prompts.
conda create -n PhyCoBench python=3.8
conda activate PhyCoBench
pip install -r requirements.txt
1.Prepare the generated videos and the corresponding text prompts. 2.Run the following commands.
bash scripts/run_eval.sh
Then you can obtain the benchmark scores for the video.
Our PhyCoPredictor comprises two diffusion models, thus requiring a two-stage training process. In stage one, we train the latent flow diffusion module, and in stage two, we train the latent video diffusion module.
1.Stage One
Train the latent flow diffusion module from scratch.
bash configs/train_sh/run_latent_flow.sh
Then modify the configuration file to add the model weights trained in the previous step as the pre-trained model, and proceed with further fine-tuning training.
bash configs/train_sh/run_latent_flow.sh
2.Stage Two
Train the latent video diffusion module with DynamiCrafter pre-trained model.
bash configs/train_sh/run_visual_flow.sh
Part of our code was borrowed from flowformer++, DynamiCrafter. We thank all the authors of these repositories for their valuable implementations.