A curated list of Vision-Language-Action (VLA) models, benchmarks, and datasets for robotic manipulation and embodied AI.
- Paper: https://arxiv.org/abs/2501.15830
- Status: ✅ Successfully reproduced the results in the paper
- Notes: Code is very clean. using PaliGemma 3B LLM, tokenizer bin action head.
- Website: https://openvla-oft.github.io
- Status: ✅ Successfully reproduced the results in the paper
- Notes: using llama2-7B LLM, mlp or diffusion action head.
- Achieves ~94% average accuracy on the LIBERO benchmark.
Reference - Model details: Uses PaliGemma-3B as the LLM and DiT for the action head.
- Performs well with only 80 samples, fine-tuned on A100.
- Scales with 3–4k high-quality samples. Successful fine-tuning on
Hugging Face model using:- bf16
- batch size = 12
- ~70GB VRAM , 8h100, 15 hours.
- multi-machine setup
- DeepSpeed ZeRO-2 (no offloading)
Training from scratch fails when data is limited.
pi0-fast
variant works effectively in this paper.
Project site: Physical Intelligence – pi0-fast
- Paper: ACT: Adaptive Curriculum for Robot Policy Learning
- Note: Can produce smooth grasping behavior with as few as 70 demonstrations.
- GitHub: real-stanford/diffusion_policy
- Works effectively in this paper. Case2: use 30 trajectories to train the model, train 30 hours on 4090 gpu and get high success rate.
- Paper: https://arxiv.org/abs/2506.01844
- Successfully tested 450M checkpoint on Lerobot SO101 for real-world fork picking tasks. Training parameters: batch size 12, 4.1GB VRAM usage, converges between 3,000-27,000 steps.
- Paper: https://arxiv.org/abs/2503.14734
- Blog: https://research.nvidia.com/labs/gear/gr00t-n1_5/
- Status: ✅ Successfully reproduced robocasa results.
- Notes: using 3B LLM, flow matching DiT action head.
- Github: https://github.com/NVIDIA/Isaac-GR00T
- Paper: https://arxiv.org/pdf/2505.06111
- Status: ✅ Successfully reproduced the results in the paper
- Github: https://github.com/OpenDriveLab/UniVLA
Note: There are hundreds of VLA models available. This list focuses on models that I have personally tested or for which reproduction results have been reported somewhere.
-
LIBERO
-
SimplerEnv
- Enhanced Repository: https://github.com/DelinQu/SimplerEnv-OpenVLA , This repository provides additional scripts and utilities to help you run the benchmark
-
RoboCasa:
-
RLBench:
-
CALVIN:
- Repository: https://github.com/mees/calvin
-
Meta-World:
- Website: https://meta-world.github.io/
-
RoboTwin:
-
GenieSim:
- Repository: https://github.com/AgibotTech/genie_sim
- Notes: Only support docker environment. It is difficult to use. Requires to adjust ROS.
-
BEHAVIOR-1K:
- Website: https://robotics-transformer-x.github.io/
- Code: https://github.com/kpertsch/rlds_dataset_mod
- Total Size: ~2.0TB
Dataset Breakdown:
1.2T ./fmb_dataset
126G ./taco_play
128G ./bc_z
124G ./bridge_orig # 2.1M samples
140G ./furniture_bench_dataset_converted_externally_to_rlds
98G ./fractal20220817_data # 3.78M samples
70G ./kuka
22G ./dobbe
20G ./berkeley_autolab_ur5
16G ./stanford_hydra_dataset_converted_externally_to_rlds
16G ./utaustin_mutex
14G ./austin_sailor_dataset_converted_externally_to_rlds
13G ./nyu_franka_play_dataset_converted_externally_to_rlds
11G ./toto
8.0G ./austin_sirius_dataset_converted_externally_to_rlds
5.9G ./iamlab_cmu_pickup_insert_converted_externally_to_rlds
4.5G ./roboturk
3.3G ./berkeley_cable_routing
3.2G ./viola
3.0G ./jaco_play
2.5G ./berkeley_fanuc_manipulation
1.2G ./austin_buds_dataset_converted_externally_to_rlds
510M ./cmu_stretch
263M ./dlr_edan_shared_control_converted_externally_to_rlds
110M ./ucsd_kitchen_dataset_converted_externally_to_rlds
- Website: https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-Teleop-Sim
- Notes: 1000 teleoperation trajectories for each of the 24 tabletop tasks. Total data storage (HDF5 format): ~14GB, Total data storage (LeRobot-style dataset): ~39GB
- Website: https://droid-dataset.github.io/
- Notes: 1.7TB dataset of 1000+ teleop sessions with 100+ unique tasks.
- Repository: https://github.com/mees/calvin
- Size: ~1.1TB
We welcome contributions! Please feel free to:
- Add new VLA models you've tested
- Share benchmark that is easy to use
- Report dataset experiences
- Submit pull requests or issues
Feel free to send us pull requests, issues, or email to share your reproduction experience!