GitHub - JubSteven/POEM-v2: (TPMAI 2025) Generalized Multi-view Hand Mesh Reconstruction

Generalizable Multi-view Hand Mesh Recovery

Multi-view Hand Reconstruction with a Point-Embedded Transformer

Lixin Yang · Licheng Zhong · Pengxiang Zhu · Xinyu Zhan · Junxiao Kong . Jian Xu . Cewu Lu

Supports reconstruction of both left and right hand.
Absolute metric output and occlusion-robust.	It support Human-hand teleoperation.

What‘s POEM-v2?

POEM (POint-EMbed Multi-view Transformer) v2 is a generalizable multi-view hand mesh recovery model designed for seamless use in real-world hand MoCap & teleoperation.

What is POEM-v2's advantage?

It is flexible: Works with any number, order or arrangement of cameras, as long as:

share overlapping views,
see the hand in at least some cameras,
have calibrated extrinsics

It is robust to occlusion: It can handle occlusion and partial visibility by leveraging views where the hand remains visible.

It produces absolute hand position: It directly recovers hand‐surface vertices in real‐world (meter) units, referenced to the first camera’s coordinate system..

It supports both left and right hands:: Although trained on right-hand data, it can still also handle left hand by a world-mirroring process (horizon-tally flipping all images and mirroring camera extrinsics along the first camera's Y-Z plane)

🕹️ Try me

We provide a real-world demonstration for running our model.

Download the example data from huggingface. The tarball includes multi-view video of manipulaiton captured in a laboratory setting, along with the corresponding camera instrinsics and extrinsics, hand poses, and hand's side information.

In the file tool/infer_hand.py, modify the path prefix (/prefix/data/) to the full path of the directory where the data has been extracted.

DATA_FILEDIR = "/prefix/data/data" # Modify /prefix/data to where example data is extracted
MASK_FILEDIR = "/prefix/data/human_mask_hand"
CALIB_FILEDIR = "/prefix/data/calib/calib__2025_0319_1534_41"
HAND_SIDE_FILEPATH = "/prefix/data/hand_labels.json"

The visualize command (you need to install our env first)

python -m tool.infer_hand -c config/release/eval_single.yaml --reload ./checkpoints/medium.pth.tar -g 0

As a multiview method, camera extrinsics mat is crucial for POEM-v2 making prediction. In the tool/infer_hand.py, we require the N extrinsics matrices $T_{cw}$ in the SE(3) form:

$$P_c = T_{cw} \cdot P_w$$

where c indicats the camera coordinate system, w indicates the world, and $\mathbf{P}_c$ is the 3D points in camera system.

📓 Instructions

See docs/installation.md to setup the environment and install all the required packages.
See docs/datasets.md to download all the datasets and additional assets required.

🏃 Training and Evaluation

Available models

We provide four models with different configurations for training and evaluation. We have evaluated the models on multiple datasets.

set ${MODEL} as one in [small, medium, medium_MANO, large].
set ${DATASET} as one in [HO3D, DexYCB, Arctic, Interhand, Oakink, Freihand].

Download the pretrained checkpoints at 🔗ckpt_release and move the contents to ./checkpoints.

Command line arguments

-g, --gpu_id, visible GPUs for training, e.g. -g 0,1,2,3. evaluation only supports single GPU.
-w, --workers, num_workers in reading data, e.g. -w 4.
-p, --dist_master_port, port for distributed training, e.g. -p 60011, set different -p for different training processes.
-b, --batch_size, e.g. -b 32, default is specified in config file, but will be overwritten if -b is provided.
--cfg, config file for this experiment, e.g. --cfg config/release/train_${MODEL}.yaml.
--exp_id specify the name of experiment, e.g. --exp_id ${EXP_ID}. When --exp_id is provided, the code requires that no uncommitted change is remained in the git repo. Otherwise, it defaults to 'default' for training and 'eval_{cfg}' for evaluation. All results will be saved in exp/${EXP_ID}*{timestamp}.
--reload, specify the path to the checkpoint (.pth.tar) to be loaded.

Compare POEM-v2 vs Single-view methods on HO3D

To provide a holistic benchmark, we compare POEM-v2 with state-of-the-art single-view 3D hand recon- struction frameworks. Since the absolute position of hands is ambiguous in a single-view setting, we only report the MPJPE and MPVPE under the Procrustes Alignment.

We perform this comparison on the official HO3D test set v2 and v3, now the testset GT can be download from the official repo (Update - Nov 3rd, 2024).

├── HO3D_v2 
├── HO3D_v2_official_gt 
│   ├── evaluation_verts.json
│   └── evaluation_xyz.json
├── HO3D_v3 
├── HO3D_v3_manual_test_gt 
    ├── evaluation_verts.json
    └── evaluation_xyz.json

Then run the following command to get the results:

# HO3D_VERSION can be set to 2 or 3,
$ python scripts/eval_ho3d_official.py  --ho3d-v ${HO3D_VERSION}
                                        --cfg config/release/eval_single.yaml 
                                        --model large  
                                        --reload ${PATH_TO_POEM_LARGE_CKPT} 
                                        --eval_extra ho3d_offi

Then you can get the results reported in the paper:

Evaluation

Specify the ${PATH_TO_CKPT} to ./checkpoints/${MODEL}.pth.tar. Then, run the following command. Note that we essentially modify the config file in place to suit different configuration settings. view_min and view_max specify the range of views fed into the model. Use --draw option to render the results, note that it is incompatible with the computation of auc metric.

$ python scripts/eval_single.py --cfg config/release/eval_single.yaml
                                -g ${gpu_id}
                                --reload ${PATH_TO_CKPT}
                                --dataset ${DATASET}
                                --view_min ${MIN_VIEW}
                                --view_max ${MAX_VIEW}
                                --model ${MODEL}

The evaluation results will be saved at exp/${EXP_ID}_{timestamp}/evaluations.

Training

We have used the mixature of multiple datasets packed by webdataset for training. Excecute the following command to train a specific model on the provided dataset.

$ python scripts/train_ddp_wds.py --cfg config/release/train_${MODEL}.yaml -g 0,1,2,3 -w 4

Tensorboard

$ cd exp/${EXP_ID}_{timestamp}/runs/
$ tensorboard --logdir .

Checkpoint

All the checkpoints during training are saved at exp/${EXP_ID}_{timestamp}/checkpoints/, where ../checkpoints/checkpoint records the most recent checkpoint.

License

This code and model are available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using the code and model you agree to the terms in the LICENSE.

Citation

@misc{yang2024multiviewhandreconstructionpointembedded,
      title={Multi-view Hand Reconstruction with a Point-Embedded Transformer}, 
      author={Lixin Yang and Licheng Zhong and Pengxiang Zhu and Xinyu Zhan and Junxiao Kong and Jian Xu and Cewu Lu},
      year={2024},
      eprint={2408.10581},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.10581}, 
}

For more questions, please contact Lixin Yang: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
config		config
docs		docs
lib		lib
prepare		prepare
scripts		scripts
tool		tool
transform		transform
video_tool		video_tool
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ddp_python		ddp_python
environment.yml		environment.yml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generalizable Multi-view Hand Mesh Recovery

Multi-view Hand Reconstruction with a Point-Embedded Transformer

What‘s POEM-v2?

What is POEM-v2's advantage?

🕹️ Try me

📓 Instructions

🏃 Training and Evaluation

Available models

Command line arguments

Compare POEM-v2 vs Single-view methods on HO3D

Evaluation

Training

Tensorboard

Checkpoint

License

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

JubSteven/POEM-v2

Folders and files

Latest commit

History

Repository files navigation

Generalizable Multi-view Hand Mesh Recovery

Multi-view Hand Reconstruction with a Point-Embedded Transformer

What‘s POEM-v2?

What is POEM-v2's advantage?

🕹️ Try me

📓 Instructions

🏃 Training and Evaluation

Available models

Command line arguments

Compare POEM-v2 vs Single-view methods on HO3D

Evaluation

Training

Tensorboard

Checkpoint

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages