Skip to content

Shahzadnit/EZ-CLIP

Repository files navigation

EZ-CLIP: Efficient Zero-Shot Video Action Recognition

Official PyTorch Implementation
arXiv TMLR 2025 T2L Repository


🚀 Major Announcement: Published in TMLR 2025!

🎉 EZ-CLIP has evolved into T2L: Efficient Zero-Shot Action Recognition with Temporal Token Learning and is now published in Transactions on Machine Learning Research (TMLR) 2025!
We’ve released a new, enhanced codebase for T2L, incorporating the latest advancements. Visit the new repository for the most up-to-date code and resources:
👉 T2L Repository 👈

This EZ-CLIP repository remains available for reference but may not receive further updates. Explore T2L for the cutting-edge implementation!


Updates

  • 📦 Trained Models: Download pre-trained models from Google Drive.
  • 📄 Published Paper: See details in the TMLR 2025 publication and new T2L repository.

Overview

EZ-CLIP Architecture

EZ-CLIP is an innovative adaptation of CLIP tailored for zero-shot video action recognition. By leveraging temporal visual prompting, it seamlessly integrates temporal dynamics while preserving CLIP’s powerful generalization. A novel motion-focused learning objective enhances its ability to capture video motion, all without altering CLIP’s core architecture.

For the latest advancements, check out T2L: Efficient Zero-Shot Action Recognition with Temporal Token Learning in the T2L Repository.

Contents

Introduction

EZ-CLIP tackles the challenge of adapting CLIP for zero-shot video action recognition with a lightweight and efficient approach. Through temporal visual prompting and a specialized learning objective, it captures motion dynamics effectively while retaining CLIP’s generalization capabilities. This makes EZ-CLIP both practical and powerful for video understanding tasks.

The work has been significantly advanced in our TMLR 2025 publication, T2L: Efficient Zero-Shot Action Recognition with Temporal Token Learning. Explore the T2L Repository for the latest developments.

Prerequisites

Set up the environment using the provided requirements.txt:

pip install -r requirements.txt

Model Zoo

Note: All models are based on the publicly available ViT/B-16 CLIP model.

Zero-Shot Results

Trained on Kinetics-400 and evaluated on downstream datasets.

Model Input HMDB-51 UCF-101 Kinetics-600 Model Link
EZ-CLIP (ViT-16) 8x224 52.9 79.1 70.1 Link

Base-to-Novel Generalization Results

Datasets are split into base and novel classes, with models trained on base classes and evaluated on both.

Dataset Input Base Acc. Novel Acc. HM Model Link
K-400 8x224 73.1 60.6 66.3 Link
HMDB-51 8x224 77.0 58.2 66.3 Link
UCF-101 8x224 94.4 77.9 85.4 Link
SSV2 8x224 16.6 13.3 14.8 Link

Data Preparation

Extract videos into frames for efficient processing. See the Dataset_creation_scripts directory for instructions.
Supported datasets:

Training

Train EZ-CLIP with:

python train.py --config configs/K-400/k400_train.yaml

Testing

Evaluate a trained model with:

python test.py --config configs/ucf101/UCF_zero_shot_testing.yaml

Citation

If you find this code or models useful, please cite our work:

TMLR 2025 Publication:

@article{
ahmad2025tl,
title={T2L: Efficient Zero-Shot Action Recognition with Temporal Token Learning},
author={Shahzad Ahmad and Sukalpa Chanda and Yogesh S Rawat},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=WvgoxpGpuU},
note={}
}

arXiv Preprint:

@article{ahmad2023ezclip,
  title={EZ-CLIP: Efficient Zero-Shot Video Action Recognition},
  author={Ahmad, Shahzad and Chanda, Sukalpa and Rawat, Yogesh S},
  journal={arXiv preprint arXiv:2312.08010},
  year={2023}
}

Acknowledgments

This codebase builds upon ActionCLIP. We express our gratitude to the authors for their foundational contributions.
For the latest updates, visit the T2L Repository.


Contact: For questions or issues, please open an issue on this repository or the T2L Repository.

Explore the Future of Zero-Shot Action Recognition with T2L!
👉 T2L Repository 👈

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages