Lixin Yang · Licheng Zhong · Pengxiang Zhu · Xinyu Zhan · Junxiao Kong . Jian Xu . Cewu Lu
![]() Supports reconstruction of both left and right hand. |
|
![]() Absolute metric output and occlusion-robust. |
![]() It support Human-hand teleoperation. |
POEM (POint-EMbed Multi-view Transformer) v2 is a generalizable multi-view hand mesh recovery model designed for seamless use in real-world hand MoCap & teleoperation.
It is flexible: Works with any number, order or arrangement of cameras, as long as:
- share overlapping views,
- see the hand in at least some cameras,
- have calibrated extrinsics
It is robust to occlusion: It can handle occlusion and partial visibility by leveraging views where the hand remains visible.
It produces absolute hand position: It directly recovers hand‐surface vertices in real‐world (meter) units, referenced to the first camera’s coordinate system..
It supports both left and right hands:: Although trained on right-hand data, it can still also handle left hand by a world-mirroring process (horizon-tally flipping all images and mirroring camera extrinsics along the first camera's Y-Z plane)
We provide a real-world demonstration for running our model.
Download the example data from huggingface. The tarball includes multi-view video of manipulaiton captured in a laboratory setting, along with the corresponding camera instrinsics and extrinsics, hand poses, and hand's side information.
In the file tool/infer_hand.py
, modify the path prefix (/prefix/data/
) to the full path of the directory where the data has been extracted.
DATA_FILEDIR = "/prefix/data/data" # Modify /prefix/data to where example data is extracted
MASK_FILEDIR = "/prefix/data/human_mask_hand"
CALIB_FILEDIR = "/prefix/data/calib/calib__2025_0319_1534_41"
HAND_SIDE_FILEPATH = "/prefix/data/hand_labels.json"
The visualize command (you need to install our env first)
python -m tool.infer_hand -c config/release/eval_single.yaml --reload ./checkpoints/medium.pth.tar -g 0
As a multiview method, camera extrinsics mat is crucial for POEM-v2 making prediction.
In the tool/infer_hand.py
, we require the N extrinsics matrices
where c
indicats the camera coordinate system, w
indicates the world, and
- See docs/installation.md to setup the environment and install all the required packages.
- See docs/datasets.md to download all the datasets and additional assets required.
We provide four models with different configurations for training and evaluation. We have evaluated the models on multiple datasets.
- set
${MODEL}
as one in[small, medium, medium_MANO, large]
. - set
${DATASET}
as one in[HO3D, DexYCB, Arctic, Interhand, Oakink, Freihand]
.
Download the pretrained checkpoints at 🔗ckpt_release and move the contents to ./checkpoints
.
-
-g, --gpu_id
, visible GPUs for training, e.g.-g 0,1,2,3
. evaluation only supports single GPU. -
-w, --workers
, num_workers in reading data, e.g.-w 4
. -
-p, --dist_master_port
, port for distributed training, e.g.-p 60011
, set different-p
for different training processes. -
-b, --batch_size
, e.g.-b 32
, default is specified in config file, but will be overwritten if-b
is provided. -
--cfg
, config file for this experiment, e.g.--cfg config/release/train_${MODEL}.yaml
. -
--exp_id
specify the name of experiment, e.g.--exp_id ${EXP_ID}
. When--exp_id
is provided, the code requires that no uncommitted change is remained in the git repo. Otherwise, it defaults to 'default' for training and 'eval_{cfg}' for evaluation. All results will be saved inexp/${EXP_ID}*{timestamp}
. -
--reload
, specify the path to the checkpoint (.pth.tar) to be loaded.
To provide a holistic benchmark, we compare POEM-v2 with state-of-the-art single-view 3D hand recon- struction frameworks. Since the absolute position of hands is ambiguous in a single-view setting, we only report the MPJPE and MPVPE under the Procrustes Alignment.
We perform this comparison on the official HO3D test set v2 and v3, now the testset GT can be download from the official repo (Update - Nov 3rd, 2024).
├── HO3D_v2
├── HO3D_v2_official_gt
│ ├── evaluation_verts.json
│ └── evaluation_xyz.json
├── HO3D_v3
├── HO3D_v3_manual_test_gt
├── evaluation_verts.json
└── evaluation_xyz.json
Then run the following command to get the results:
# HO3D_VERSION can be set to 2 or 3,
$ python scripts/eval_ho3d_official.py --ho3d-v ${HO3D_VERSION}
--cfg config/release/eval_single.yaml
--model large
--reload ${PATH_TO_POEM_LARGE_CKPT}
--eval_extra ho3d_offi
Then you can get the results reported in the paper:
![]() |
![]() |
Specify the ${PATH_TO_CKPT}
to ./checkpoints/${MODEL}.pth.tar
. Then, run the following command. Note that we essentially modify the config file in place to suit different configuration settings. view_min
and view_max
specify the range of views fed into the model. Use --draw
option to render the results, note that it is incompatible with the computation of auc
metric.
$ python scripts/eval_single.py --cfg config/release/eval_single.yaml
-g ${gpu_id}
--reload ${PATH_TO_CKPT}
--dataset ${DATASET}
--view_min ${MIN_VIEW}
--view_max ${MAX_VIEW}
--model ${MODEL}
The evaluation results will be saved at exp/${EXP_ID}_{timestamp}/evaluations
.
We have used the mixature of multiple datasets packed by webdataset for training. Excecute the following command to train a specific model on the provided dataset.
$ python scripts/train_ddp_wds.py --cfg config/release/train_${MODEL}.yaml -g 0,1,2,3 -w 4
$ cd exp/${EXP_ID}_{timestamp}/runs/
$ tensorboard --logdir .
All the checkpoints during training are saved at exp/${EXP_ID}_{timestamp}/checkpoints/
, where ../checkpoints/checkpoint
records the most recent checkpoint.
This code and model are available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using the code and model you agree to the terms in the LICENSE.
@misc{yang2024multiviewhandreconstructionpointembedded,
title={Multi-view Hand Reconstruction with a Point-Embedded Transformer},
author={Lixin Yang and Licheng Zhong and Pengxiang Zhu and Xinyu Zhan and Junxiao Kong and Jian Xu and Cewu Lu},
year={2024},
eprint={2408.10581},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.10581},
}
For more questions, please contact Lixin Yang: [email protected]