GitHub - kaist-cvml/part-catseg: [CVPR 2025] Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation

Fine-Grained Image-Text Correspondence with
Cost Aggregation for Open-Vocabulary Part Segmentation

Jiho Choi^{1 }, Seonho Lee^{1 }, Seungho Lee², Minhyun Lee², Hyunjung Shim¹
(* indicates equal contributions)
¹Graduate School of Artificial Intelligence, KAIST, Republic of Korea
²School of Integrated Technology, Yonsei University, Republic of Korea
`{jihochoi, glanceyes, kateshim}@kaist.ac.kr`, `{seungholee, lmh315}@yonsei.ac.kr`

PartCATSeg

PartCATSeg is a novel framework to address the critical challenges of OVPS(Open-Vocabulary Part Segmentation) and significantly improve model performance by leveraging the cost volume.

Updates

[2025.04.15] 👨‍💻 The official codes have been released!
[2025.02.26] 🎉 Our paper has been accepted to CVPR 2025!
[2025.01.16] 📄 Our paper is now available! You can find the paper here.

Installation

# ------------------
#     Init conda
# ------------------
conda create --name partcatseg python=3.8 -y
conda activate partcatseg
pip install --upgrade pip
conda install cuda=12.4.1 -c nvidia -y
pip install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu121
pip install timm==0.9.1
pip install scikit-image==0.21.0
pip install scikit-learn==0.24.2
pip install opencv-python==4.5.5.64
pip install hydra-core==1.3.2
pip install openmim==0.3.6
pip install mmsegmentation==0.29.1
pip install tokenizers==0.11.1
pip install Pillow~=9.5
pip install numpy==1.23.0
pip install einops ftfy regex fire ninja psutil gdown

# --------------------------
#     Install Detectron2
# --------------------------
pip install 'git+https://github.com/facebookresearch/detectron2.git'
python -c "import detectron2; print(detectron2.__version__)"  # 0.6

# --------------------------
#     Install mmcv
# --------------------------
# pip install mmcv-full==1.7.1
# => if an error occurs
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
python -c "import mmcv; print(mmcv.__version__)"  # 1.7.1

Prepare Datasets

cd datasets

PascalPart116

You can find further information in the OV-PARTS GitHub repository.

gdown https://drive.google.com/uc?id=1QF0BglrcC0teKqx15vP8qJNakGgCWaEH
tar -xzf PascalPart116.tar.gz
find ./PascalPart116/images/val/ -name '._*' -delete
find ./PascalPart116/ -name '._*' -delete

ADE20KPart234

gdown https://drive.google.com/uc?id=1EBVPW_tqzBOQ_DC6yLcouyxR7WrctRKi
tar -xzf ADE20KPart234.tar.gz

PartImageNet (Seg)

Download the LOC_synset_mapping.txt file from this link and place it in the datasets folder.
Download PartImageNet_Seg from PartImageNet and extract it into the datasets folder as PartImageNet.

PartImageNet (OOD)

Download PartImageNet_OOD from PartImageNet and place it into the datasets folder as PartImageNet_OOD.

Preprocess Datasets

PascalPart116
ADE20KPart234
PartImageNet (Seg)
PartImageNet (OOD)

# PascalPart116
python baselines/data/datasets/mask_cls_collect.py \
    datasets/PascalPart116/annotations_detectron2_part/val \
    datasets/PascalPart116/annotations_detectron2_part/val_part_label_count.json

python baselines/data/datasets/mask_cls_collect.py \
    datasets/PascalPart116/annotations_detectron2_obj/val \
    datasets/PascalPart116/annotations_detectron2_part/val_obj_label_count.json

# ADE20KPart234
# (no preprocessing required)

# PartImageNet (Seg)
cd datasets
python partimagenet_preprocess.py --data_dir PartImageNet
# Make sure to have LOC_synset_mapping.txt in the datasets folder mentioned above.

# PartImageNet (OOD)
# train split
find PartImageNet_OOD/images/train/* -type f -exec mv {} PartImageNet_OOD/images/train/ \;
find PartImageNet_OOD/images/train/* -type d -empty -delete

# val split
find PartImageNet_OOD/images/val/* -type f -exec mv {} PartImageNet_OOD/images/val/ \;
find PartImageNet_OOD/images/val/* -type d -empty -delete

Pre-trained Weights

Please first donwload the pre-trained CAT-Seg weight model_final_base.pth from the following link to use them in the training.

Model	Checkpoint
CAT-Seg	download link

Make sure to place the downloaded weights in the pretrain_weights folder.

mkdir pretrain_weights && cd pretrain_weights
# CAT-Seg
wget https://huggingface.co/hamacojr/CAT-Seg/resolve/main/model_final_base.pth
cd ..

For the evaluation, we provide the pre-trained weights for the PartCATSeg model on the following datasets:

Model	Setting	Dataset	Checkpoint
PartCATSeg	zero-shot	Pascal-Part-116	model
PartCATSeg	zero-shot	ADE20K-Part-234	model
PartCATSeg	zero-shot	PartImageNet (Seg)	TBA

mkdir weights && cd weights

# Pascal-Part-116
# PartCATSeg (partcatseg_voc.pth (928M on Ubuntu / 885M on Google Drive))
gdown https://drive.google.com/uc?id=1JUJjJQLMKE96H5SLNs4EMm4jiU6fPgRb

# ADE20K-Part-234
# PartCATSeg (partcatseg_ade.pth (928M on Ubuntu / 885M on Google Drive))
gdown https://drive.google.com/uc?id=1MKQEk71o9Xvs4aBY_GLyGa1lWTaW10fv

# PartImageNet (Seg)
# PartCATSeg (partcatseg_partimagenet_seg.pth)
# TBA

Usage (Run)

Zero-Shot Prediction

# -------------
#     Train
# -------------
python train_net.py \
    --num-gpus 4 \
    --config-file configs/zero_shot/partcatseg_voc.yaml

# -----------------
#     Inference
# -----------------
python train_net.py \
    --num-gpus 4 \
    --config-file configs/zero_shot/partcatseg_voc.yaml \
    --eval-only MODEL.WEIGHTS ./weights/partcatseg_voc.pth

Please note that the performance on PartImageNet_OOD should be evaluated by the metric 'mIoU-unbase' since the validation set of PartImageNet_OOD does not contain the base classes.

Project Structure

[PROJECT_ROOT]
├── datasets/
│   ├── PascalPart116/
│   │   ├── images
│   │   ├── annotations_detectron2_obj
│   │   └── annotations_detectron2_part
│   ├── ADE20KPart234/
│   ├── PartImageNet/
│   └── PartImageNet_OOD/
│       ├── images
│       │   ├── annotations_detectron2_part
│       │   └── annotations_detectron2_part
│       ├── train.json
│       └── val.json
├── weights/
│   ├── partcatseg_voc.pth
│   └── partcatseg_ade.pth
├── configs/
│   ├── zero_shot
│   │   ├── partcatseg_voc.yaml
│   │   ├── partcatseg_ade.yaml
│   │   ├── partcatseg_partimagenet.yaml
│   │   ├── partcatseg_partimagenet_ood.yaml
│   │   └── ...
├── baselines/
│   ├── evaluation/
│   │   └── partcatseg_evaluation.py
│   ├── data/
│   │   ├── dataset_mappers/
│   │   │   └── object_part_mapper.py
│   │   └── datasets/
│   │       └── register_pascal_part_116.py
│   ├── modeling/
│   │   ├── backbone/
│   │   │   └── dinov2_backbone.py
│   │   ├── heads/
│   │   │   └── part_cat_seg_head.py
│   │   └── transformer/
│   │       ├── part_cat_seg_model.py
│   │       ├── part_cat_seg_predictor.py
│   │       └── part_cat_seg_model.py
│   ├── utils/
│   │   └── visualizer.py
│   ├── third_party/
│   ├── partcatseg.py
│   └── config.py
├── README.md
└── train_net.py

Acknowledgement

We would like to express our gratitude to the open-source projects and their contributors, including PartCLIPSeg, OV-PARTS, CLIPSeg, Mask2Former, CLIP, and OV-DETR.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
baselines		baselines
configs		configs
datasets		datasets
open_clip		open_clip
transformers		transformers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-Grained Image-Text Correspondence with
Cost Aggregation for Open-Vocabulary Part Segmentation

PartCATSeg

Updates

Installation

Prepare Datasets

PascalPart116

ADE20KPart234

PartImageNet (Seg)

PartImageNet (OOD)

Preprocess Datasets

Pre-trained Weights

Usage (Run)

Zero-Shot Prediction

Project Structure

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

kaist-cvml/part-catseg

Folders and files

Latest commit

History

Repository files navigation

Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation

PartCATSeg

Updates

Installation

Prepare Datasets

PascalPart116

ADE20KPart234

PartImageNet (Seg)

PartImageNet (OOD)

Preprocess Datasets

Pre-trained Weights

Usage (Run)

Zero-Shot Prediction

Project Structure

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Fine-Grained Image-Text Correspondence with
Cost Aggregation for Open-Vocabulary Part Segmentation

Packages