This repository contains a Python implementation of the following paper: "Robust Conformal Outlier Detection under Contaminated Reference Data". The repository includes an implementation of the proposed Label-Trim method. Baseline methods implementation alongside code for real data experiments are included in this repository.
Conformal prediction is a flexible framework for calibrating machine learning predictions, providing distribution-free statistical guarantees. In outlier detection, this calibration relies on a reference set of labeled inlier data to control the type-I error rate. However, obtaining a perfectly labeled inlier reference set is often unrealistic, and a more practical scenario involves access to a contaminated reference set containing a small fraction of outliers. This paper analyzes the impact of such contamination on the validity of conformal methods. We prove that under realistic, non-adversarial settings, calibration on contaminated data yields conservative type-I error control, shedding light on the inherent robustness of conformal methods. This conservativeness, however, typically results in a loss of power. To alleviate this limitation, we propose a novel, active data-cleaning framework that leverages a limited labeling budget and an outlier detection model to selectively annotate data points in the contaminated reference set that are suspected as outliers. By removing only the annotated outliers in this ``suspicious'' subset, we can effectively enhance power while mitigating the risk of inflating the type-I error rate, as supported by our theoretical analysis. Experiments on real datasets validate the conservative behavior of conformal methods under contamination and show that the proposed data-cleaning strategy improves power without sacrificing validity.
To install dependencies, create a Conda environment using requirements.yml
:
conda env update -f requirements.yml
conda activate robust-cod
For experiments involving visual data, you must provide a path to the score dataset. The dataset consists of outlier scores computed using a pretrained model and an outlier detection model.
To generate such datasets, use: python ImageOD/score_datasets.py
This script processes images and extracts outlier scores using a pretrained model and an outlier detection model.
We use datasets and pretrained models from OpenOOD for outlier detection. The required datastes and checkpoint files can be downloaded from OpenOOD’s GitHub repository. For download instructions, refer to the OpenOOD GitHub repository.
⚡ Running python ImageOD/score_datasets.py
automatically downloads missing datasets. Checkpoints must be downloaded beforehand.
To train a ReAct-based outlier detection model using a ResNet18 pretrained on CIFAR-10, with Texture dataset as the outlier dataset, and a contamination rate of 0.03, run:
python ImageOD/score_datasets.py --save_path ./datasets_scores/cifar10_texture/n_train_2000_0.03/ \
--id_dataset cifar10 --ood_dataset texture --postprocess react \
--net_ckpt_path ./openood/results/checkpoints/cifar10_resnet18_32x32_base_e100_lr0.1_default/s1/best.ckpt \
--net resnet18_32x32 --p_train 0.03 --n_train 2000
--id_dataset cifar10
→ In-distribution dataset (CIFAR-10).--ood_dataset texture
→ Out-of-distribution dataset (Texture).--postprocess react
→ Applies ReAct post-processing.--net_ckpt_path
→ Path to the pretrained model checkpoint.--p_train 0.03
→ Contamination rate of 3%.--n_train 2000
→ Number of training samples.
Once the dataset is created, you can use it in experiments by specifying its path.
You can run experiments using main.py
, either by specifying parameters directly or by using a YAML configuration file.
To run a single experiment with custom parameters, use main.py
.
For example, to compare all methods on the shuttle dataset using Isolation Forest model with a labeling budget of m=50, run:
python main.py --save_path ./results/ --model IF --level 0.01 --n_cal 2500 --p_cal 0.03 \
--n_train 5000 --n_test 1000 --p_test 0.05 --dataset shuttle --initial_labeled 50
For a full list of command-line arguments, run:
python main.py --help
Instead of specifying parameters manually, you can use a YAML config file to define all parameters.
For example, to run a contamination rate experiment, execute:
python run_exp.py -c ./experiments/tabular_data/real_data_shuttle_contamination_rate_exp.yml -s ./results/
- The YAML file can contain lists of parameters, allowing you to run all possible parameter combinations automatically.
- Experiment configuration files are stored in the
experiments/
folder.
Got it! Here’s the improved version of your note:
You can run experiments with multiple target Type-I error rate levels by providing a list of values. For example:
--level 0.01 0.02 0.03
When plotting the results, you can filter by a specific target Type-I error level and repeat the process for different levels (this will be clearer after reading the "Plotting Experiment Results" part). For example:
--filter_k level --filter_v 0.01
By default, the code is designed to run on a computing cluster using the SLURM scheduler for distributed execution.
- Each random seed is executed as a separate SLURM job to enable parallelism.
- To run experiments locally, add the
--local
flag when executingmain.py
(or set it in the YAML file underflag_params
). - To disable seed-based job distribution, use the
--no_distribute
flag (or set it in the YAML file underflag_params
).
Example:
python main.py --local --no_distribute
Once the experiments are complete, use plot_main.py
to visualize the results. For example:
python plot_main.py --results_dir ./results/ --plot_dir ./plots/
- To plot a specific experiment, use the
--x
argument. - To filter results before plotting, use:
--filter_k <key> --filter_v <value>
- To generate a summary table, add the
--table
flag.
For a full list of command-line arguments, run:
python plot_main.py --help
This project is licensed under the MIT License.
If you use this code or ideas from this project in your research, please cite:
@inproceedings{bashari2025robust,
title={Robust Conformal Outlier Detection under Contaminated Reference Data},
author={Meshi Bashari and Matteo Sesia and Yaniv Romano},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=s55Af9Emyq}
}