Jiho Choi1 *, Seonho Lee1 *, Seungho Lee2, Minhyun Lee2, Hyunjung Shim1
(* indicates equal contributions)
1Graduate School of Artificial Intelligence, KAIST, Republic of Korea
2School of Integrated Technology, Yonsei University, Republic of Korea
{jihochoi, glanceyes, kateshim}@kaist.ac.kr
, {seungholee, lmh315}@yonsei.ac.kr
PartCATSeg is a novel framework to address the critical challenges of OVPS(Open-Vocabulary Part Segmentation) and significantly improve model performance by leveraging the cost volume.
- [2025.04.15] π¨βπ» The official codes have been released!
- [2025.02.26] π Our paper has been accepted to CVPR 2025!
- [2025.01.16] π Our paper is now available! You can find the paper here.
# ------------------
# Init conda
# ------------------
conda create --name partcatseg python=3.8 -y
conda activate partcatseg
pip install --upgrade pip
conda install cuda=12.4.1 -c nvidia -y
pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu121
pip install timm==0.9.1
pip install scikit-image==0.21.0
pip install scikit-learn==0.24.2
pip install opencv-python==4.5.5.64
pip install hydra-core==1.3.2
pip install openmim==0.3.6
pip install mmsegmentation==0.29.1
pip install tokenizers==0.11.1
pip install Pillow~=9.5
pip install numpy==1.23.0
pip install einops ftfy regex fire ninja psutil gdown
# --------------------------
# Install Detectron2
# --------------------------
pip install 'git+https://github.com/facebookresearch/detectron2.git'
python -c "import detectron2; print(detectron2.__version__)" # 0.6
# --------------------------
# Install mmcv
# --------------------------
# pip install mmcv-full==1.7.1
# => if an error occurs
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
python -c "import mmcv; print(mmcv.__version__)" # 1.7.1
cd datasets
- You can find further information in the OV-PARTS GitHub repository.
gdown https://drive.google.com/uc?id=1QF0BglrcC0teKqx15vP8qJNakGgCWaEH
tar -xzf PascalPart116.tar.gz
find ./PascalPart116/images/val/ -name '._*' -delete
find ./PascalPart116/ -name '._*' -delete
gdown https://drive.google.com/uc?id=1EBVPW_tqzBOQ_DC6yLcouyxR7WrctRKi
tar -xzf ADE20KPart234.tar.gz
- Download the
LOC_synset_mapping.txt
file from this link and place it in thedatasets
folder. - Download
PartImageNet_Seg
from PartImageNet and extract it into thedatasets
folder asPartImageNet
.
- Download
PartImageNet_OOD
from PartImageNet and place it into thedatasets
folder asPartImageNet_OOD
.
- PascalPart116
- ADE20KPart234
- PartImageNet (Seg)
- PartImageNet (OOD)
# PascalPart116
python baselines/data/datasets/mask_cls_collect.py \
datasets/PascalPart116/annotations_detectron2_part/val \
datasets/PascalPart116/annotations_detectron2_part/val_part_label_count.json
python baselines/data/datasets/mask_cls_collect.py \
datasets/PascalPart116/annotations_detectron2_obj/val \
datasets/PascalPart116/annotations_detectron2_part/val_obj_label_count.json
# ADE20KPart234
# (no preprocessing required)
# PartImageNet (Seg)
cd datasets
python partimagenet_preprocess.py --data_dir PartImageNet
# Make sure to have LOC_synset_mapping.txt in the datasets folder mentioned above.
# PartImageNet (OOD)
# train split
find PartImageNet_OOD/images/train/* -type f -exec mv {} PartImageNet_OOD/images/train/ \;
find PartImageNet_OOD/images/train/* -type d -empty -delete
# val split
find PartImageNet_OOD/images/val/* -type f -exec mv {} PartImageNet_OOD/images/val/ \;
find PartImageNet_OOD/images/val/* -type d -empty -delete
Please first donwload the pre-trained CAT-Seg weight model_final_base.pth
from the following link to use them in the training.
Model | Checkpoint |
---|---|
CAT-Seg | download link |
Make sure to place the downloaded weights in the pretrain_weights
folder.
mkdir pretrain_weights && cd pretrain_weights
# CAT-Seg
wget https://huggingface.co/hamacojr/CAT-Seg/resolve/main/model_final_base.pth
cd ..
For the evaluation, we provide the pre-trained weights for the PartCATSeg model on the following datasets:
Model | Setting | Dataset | Checkpoint |
---|---|---|---|
PartCATSeg | zero-shot | Pascal-Part-116 | model |
PartCATSeg | zero-shot | ADE20K-Part-234 | model |
PartCATSeg | zero-shot | PartImageNet (Seg) | TBA |
mkdir weights && cd weights
# Pascal-Part-116
# PartCATSeg (partcatseg_voc.pth (928M on Ubuntu / 885M on Google Drive))
gdown https://drive.google.com/uc?id=1JUJjJQLMKE96H5SLNs4EMm4jiU6fPgRb
# ADE20K-Part-234
# PartCATSeg (partcatseg_ade.pth (928M on Ubuntu / 885M on Google Drive))
gdown https://drive.google.com/uc?id=1MKQEk71o9Xvs4aBY_GLyGa1lWTaW10fv
# PartImageNet (Seg)
# PartCATSeg (partcatseg_partimagenet_seg.pth)
# TBA
# -------------
# Train
# -------------
python train_net.py \
--num-gpus 4 \
--config-file configs/zero_shot/partcatseg_voc.yaml
# -----------------
# Inference
# -----------------
python train_net.py \
--num-gpus 4 \
--config-file configs/zero_shot/partcatseg_voc.yaml \
--eval-only MODEL.WEIGHTS ./weights/partcatseg_voc.pth
Please note that the performance on PartImageNet_OOD
should be evaluated by the metric 'mIoU-unbase' since the validation set of PartImageNet_OOD
does not contain the base classes.
[PROJECT_ROOT]
βββ datasets/
β βββ PascalPart116/
β β βββ images
β β βββ annotations_detectron2_obj
β β βββ annotations_detectron2_part
β βββ ADE20KPart234/
β βββ PartImageNet/
β βββ PartImageNet_OOD/
β βββ images
β β βββ annotations_detectron2_part
β β βββ annotations_detectron2_part
β βββ train.json
β βββ val.json
βββ weights/
β βββ partcatseg_voc.pth
β βββ partcatseg_ade.pth
βββ configs/
β βββ zero_shot
β β βββ partcatseg_voc.yaml
β β βββ partcatseg_ade.yaml
β β βββ partcatseg_partimagenet.yaml
β β βββ partcatseg_partimagenet_ood.yaml
β β βββ ...
βββ baselines/
β βββ evaluation/
β β βββ partcatseg_evaluation.py
β βββ data/
β β βββ dataset_mappers/
β β β βββ object_part_mapper.py
β β βββ datasets/
β β βββ register_pascal_part_116.py
β βββ modeling/
β β βββ backbone/
β β β βββ dinov2_backbone.py
β β βββ heads/
β β β βββ part_cat_seg_head.py
β β βββ transformer/
β β βββ part_cat_seg_model.py
β β βββ part_cat_seg_predictor.py
β β βββ part_cat_seg_model.py
β βββ utils/
β β βββ visualizer.py
β βββ third_party/
β βββ partcatseg.py
β βββ config.py
βββ README.md
βββ train_net.py
We would like to express our gratitude to the open-source projects and their contributors, including PartCLIPSeg, OV-PARTS, CLIPSeg, Mask2Former, CLIP, and OV-DETR.