Awesome Vision-Language-Action Models

A curated list of Vision-Language-Action (VLA) models, benchmarks, and datasets for robotic manipulation and embodied AI.

VLA Models

SpatialVLA

Paper: https://arxiv.org/abs/2501.15830
Status: ✅ Successfully reproduced the results in the paper
Notes: Code is very clean. using PaliGemma 3B LLM, tokenizer bin action head.

OpenVLA-OFT

Website: https://openvla-oft.github.io
Status: ✅ Successfully reproduced the results in the paper
Notes: using llama2-7B LLM, mlp or diffusion action head.

Pi0

Achieves ~94% average accuracy on the LIBERO benchmark.
Reference
Model details: Uses PaliGemma-3B as the LLM and DiT for the action head.

Real-world observations

Performs well with only 80 samples, fine-tuned on A100.
Scales with 3–4k high-quality samples. Successful fine-tuning on
Hugging Face model using:
- bf16
- batch size = 12
- ~70GB VRAM , 8h100, 15 hours.
- multi-machine setup
- DeepSpeed ZeRO-2 (no offloading)
  Training from scratch fails when data is limited.
pi0-fast variant works effectively in this paper.
Project site: Physical Intelligence – pi0-fast

ACT

Paper: ACT: Adaptive Curriculum for Robot Policy Learning
Note: Can produce smooth grasping behavior with as few as 70 demonstrations.

Diffusion Policy

GitHub: real-stanford/diffusion_policy
Works effectively in this paper. Case2: use 30 trajectories to train the model, train 30 hours on 4090 gpu and get high success rate.

SmolVLA

Paper: https://arxiv.org/abs/2506.01844
Successfully tested 450M checkpoint on Lerobot SO101 for real-world fork picking tasks. Training parameters: batch size 12, 4.1GB VRAM usage, converges between 3,000-27,000 steps.

GR00T N1.5

Paper: https://arxiv.org/abs/2503.14734
Blog: https://research.nvidia.com/labs/gear/gr00t-n1_5/
Status: ✅ Successfully reproduced robocasa results.
Notes: using 3B LLM, flow matching DiT action head.
Github: https://github.com/NVIDIA/Isaac-GR00T

UniVLA

Paper: https://arxiv.org/pdf/2505.06111
Status: ✅ Successfully reproduced the results in the paper
Github: https://github.com/OpenDriveLab/UniVLA

Note: There are hundreds of VLA models available. This list focuses on models that I have personally tested or for which reproduction results have been reported somewhere.

Benchmarks

✅ Tested Benchmarks

LIBERO
- Website: https://libero-project.github.io/main.html
- Reference: https://github.com/LukeLIN-web/vote/blob/main/experiments/robot/libero/run_libero_eval.py
SimplerEnv
- Enhanced Repository: https://github.com/DelinQu/SimplerEnv-OpenVLA , This repository provides additional scripts and utilities to help you run the benchmark
RoboCasa:
- Paper: https://arxiv.org/abs/2406.02523v1
- Repository: https://github.com/robocasa/robocasa , https://github.com/robocasa/robocasa-gr1-tabletop-tasks

🔄 Benchmarks to Try

RLBench:
- Repository: https://github.com/stepjam/RLBench
CALVIN:
- Repository: https://github.com/mees/calvin
Meta-World:
- Website: https://meta-world.github.io/
RoboTwin:
- Repository: https://github.com/robotwin-Platform/RoboTwin
GenieSim:
- Repository: https://github.com/AgibotTech/genie_sim
- Notes: Only support docker environment. It is difficult to use. Requires to adjust ROS.
BEHAVIOR-1K:
- Repository: https://github.com/StanfordVL/BEHAVIOR-1K

Datasets

✅ Tested Datasets

Open X-Embodiment

Website: https://robotics-transformer-x.github.io/
Code: https://github.com/kpertsch/rlds_dataset_mod
Total Size: ~2.0TB

Dataset Breakdown:

1.2T    ./fmb_dataset
126G    ./taco_play
128G    ./bc_z
124G    ./bridge_orig                                            # 2.1M samples
140G    ./furniture_bench_dataset_converted_externally_to_rlds
98G     ./fractal20220817_data                               # 3.78M samples
70G     ./kuka
22G     ./dobbe
20G     ./berkeley_autolab_ur5
16G     ./stanford_hydra_dataset_converted_externally_to_rlds
16G     ./utaustin_mutex
14G     ./austin_sailor_dataset_converted_externally_to_rlds
13G     ./nyu_franka_play_dataset_converted_externally_to_rlds
11G     ./toto
8.0G    ./austin_sirius_dataset_converted_externally_to_rlds
5.9G    ./iamlab_cmu_pickup_insert_converted_externally_to_rlds
4.5G    ./roboturk
3.3G    ./berkeley_cable_routing
3.2G    ./viola
3.0G    ./jaco_play
2.5G    ./berkeley_fanuc_manipulation
1.2G    ./austin_buds_dataset_converted_externally_to_rlds
510M    ./cmu_stretch
263M    ./dlr_edan_shared_control_converted_externally_to_rlds
110M    ./ucsd_kitchen_dataset_converted_externally_to_rlds

GR00T Teleop Simulation Dataset

Website: https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-Teleop-Sim
Notes: 1000 teleoperation trajectories for each of the 24 tabletop tasks. Total data storage (HDF5 format): ~14GB, Total data storage (LeRobot-style dataset): ~39GB

Droid

Website: https://droid-dataset.github.io/
Notes: 1.7TB dataset of 1000+ teleop sessions with 100+ unique tasks.

Contributing

We welcome contributions! Please feel free to:

Add new VLA models you've tested
Share benchmark that is easy to use
Report dataset experiences
Submit pull requests or issues

Contact

Feel free to send us pull requests, issues, or email to share your reproduction experience!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Vision-Language-Action Models

Table of Contents

VLA Models

SpatialVLA

OpenVLA-OFT

Pi0

Real-world observations

ACT

Diffusion Policy

SmolVLA

GR00T N1.5

UniVLA

Benchmarks

✅ Tested Benchmarks

🔄 Benchmarks to Try

Datasets

✅ Tested Datasets

Open X-Embodiment

GR00T Teleop Simulation Dataset

Droid

CALVIN

RoboTwin

BEHAVIOR-1K

Contributing

Contact

About

Uh oh!

Releases

Packages

LukeLIN-web/Awesome-VLA

Folders and files

Latest commit

History

Repository files navigation

Awesome Vision-Language-Action Models

Table of Contents

VLA Models

SpatialVLA

OpenVLA-OFT

Pi0

Real-world observations

ACT

Diffusion Policy

SmolVLA

GR00T N1.5

UniVLA

Benchmarks

✅ Tested Benchmarks

🔄 Benchmarks to Try

Datasets

✅ Tested Datasets

Open X-Embodiment

GR00T Teleop Simulation Dataset

Droid

CALVIN

RoboTwin

BEHAVIOR-1K

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages