Paper | Sup. material | Video
This repo contains the code for the scene completion diffusion method proposed in the CVPR'24 paper: "Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion".
Our method leverages diffusion process as a point-wise local problem, disentangling the scene data distribution during in the diffusion process, learning only the point local neighborhood distribution. From our formulation we can achieve a complete scene representation from a single LiDAR scan directly operating over the 3D points.
Installing python (we have used python 3.8) packages pre-requisites:
sudo apt install build-essential python3-dev libopenblas-dev
pip3 install -r requirements.txt
Installing MinkowskiEngine:
pip3 install -U MinkowskiEngine==0.5.4 --install-option="--blas=openblas" -v --no-deps
To setup the code run the following command on the code main directory:
pip3 install -U -e .
The SemanticKITTI dataset has to be download from the official site and extracted in the following structure:
./lidiff/
└── Datasets/
└── SemanticKITTI
└── dataset
└── sequences
├── 00/
│ ├── velodyne/
| | ├── 000000.bin
| | ├── 000001.bin
| | └── ...
│ └── labels/
| ├── 000000.label
| ├── 000001.label
| └── ...
├── 08/ # for validation
├── 11/ # 11-21 for testing
└── 21/
└── ...
To generate the ground complete scenes you can run the map_from_scans.py
script. This will use the dataset scans and poses to generate the sequence map to be used as ground truth during training:
python3 map_from_scans.py --path Datasets/SemanticKITTI/dataset/sequences/
Once the sequences map is generated you can then train the model.
For training the diffusion model, the configurations are defined in config/config.yaml
, and the training can be started with:
python3 train.py
For training the refinement network, the configurations are defined in config/config_refine.yaml
, and the training can be started with:
python3 train_refine.py
You can download the trained model weights and save then to lidiff/checkpoints/
:
Based on this issue, we found a minor bug in our completion metric code where the metric was not computed over the entire point cloud. We have updated the code to fix it, and below we provide the updated results from Table 3 of the paper:
Method | 0.5 cm | 0.2 cm | 0.1 cm |
---|---|---|---|
PVD | 15.91 | 3.97 | 0.6 |
LMSCNet | 30.83 | 12.09 | 3.65 |
LODE | 33.81 | 16.39 | 5.0 |
MID | 31.58 | 22.72 | 13.14 |
LiDiff | 31.47 | 16.79 | 4.67 |
LiDiff (Refined) | 32.43 | 22.99 | 13.40 |
For running the scene completion inference we provide a pipeline where both the diffusion and refinement network are loaded and used to complete the scene from an input scan. You can run the pipeline with the command:
python3 tools/diff_completion_pipeline.py --diff DIFF_CKPT --refine REFINE_CKPT -T DENOISING_STEPS -s CONDITIONING_WEIGHT
We provide one scan as example in lidiff/Datasets/test/
so you can directly test it out with our trained model by just running the code above.
If you use this repo, please cite as :
@inproceedings{nunes2024cvpr,
author = {Lucas Nunes and Rodrigo Marcuzzi and Benedikt Mersch and Jens Behley and Cyrill Stachniss},
title = {{Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion}},
booktitle = {{Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)}},
year = {2024}
}