nonconform enhances anomaly detection by providing uncertainty quantification. It acts as a wrapper around most detectors from PyOD (see Supported Estimators). By leveraging one-class classification and conformal inference, nonconform enables statistically rigorous anomaly detection.
- Uncertainty Quantification: Turn anomaly scores into statistically valid p-values.
- Control False Positives: Reliably control metrics like the False Discovery Rate (FDR).
- PyOD Compatibility: Works with most PyOD anomaly detectors (see Supported Estimators).
Installation via PyPI:
pip install nonconform
Note: The following examples use the built-in datasets. Install with
pip install nonconform[data]
to run these examples. (see Optional Dependencies)
Example: Detecting anomalies with Isolation Forest on the Shuttle dataset. The approach splits data for calibration, trains the model, then converts anomaly scores to statistically valid p-values by comparing test scores against the calibration distribution.
from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control
from nonconform.strategy import Split
from nonconform.estimation import StandardConformalDetector
from nonconform.utils.data import load_shuttle
from nonconform.utils.stat import false_discovery_rate, statistical_power
x_train, x_test, y_test = load_shuttle(setup=True, seed=42) # built-in dataset setup
estimator = StandardConformalDetector(
detector=IForest(behaviour="new"),
strategy=Split(n_calib=1_000)
)
estimator.fit(x_train)
estimates = estimator.predict(x_test)
decisions = false_discovery_control(estimates, method='bh') <= 0.1
print(f"Empirical False Discovery Rate: {false_discovery_rate(y=y_test, y_hat=decisions)}")
print(f"Empirical Statistical Power (Recall): {statistical_power(y=y_test, y_hat=decisions)}")
Output:
Empirical False Discovery Rate: 0.058
Empirical Statistical Power (Recall): 0.97
Other conformal detector wrappers exist for advanced use cases, including WeightedConformalDetector()
(robust to covariate shifts) and sophisticated calibration strategies like Bootstrap()
for improved results.
from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control
from nonconform.strategy import Bootstrap, Split
from nonconform.estimation import StandardConformalDetector, WeightedConformalDetector
from nonconform.utils.data import load_shuttle
from nonconform.utils.stat import false_discovery_rate, statistical_power
x_train, x_test, y_test = load_shuttle(setup=True, seed=42)
SPLIT = 2500 # fixed calibration set size
ssd = StandardConformalDetector(
detector=IForest(behaviour="new"),
strategy=Split(n_calib=SPLIT)
)
# Standard Split Strategy
ssd.fit(x_train)
ssd_estimates = ssd.predict(x_test)
ssd_decisions = false_discovery_control(ssd_estimates, method='bh') <= 0.1
print(f"[Standard Split] Empirical FDR: {false_discovery_rate(y=y_test, y_hat=ssd_decisions)}")
print(f"[Standard Split] Empirical Power: {statistical_power(y=y_test, y_hat=ssd_decisions)}")
# Bootstrapping Strategy
sbt = StandardConformalDetector(
detector=IForest(behaviour="new"),
strategy=Bootstrap(n_bootstraps=20, n_calib=SPLIT)
)
sbt.fit(x_train)
sbt_estimates = sbt.predict(x_test)
sbt_decisions = false_discovery_control(sbt_estimates, method='bh') <= 0.1
print(f"[Standard Boot] Empirical FDR: {false_discovery_rate(y=y_test, y_hat=sbt_decisions)}")
print(f"[Standard Boot] Empirical Power: {statistical_power(y=y_test, y_hat=sbt_decisions)}")
# Weighted Strategy (Covariate Shift Robustness)
wcd = WeightedConformalDetector(
detector=IForest(behaviour="new"),
strategy=Split(n_calib=SPLIT)
)
wcd.fit(x_train)
wcd_estimates = wcd.predict(x_test)
wcd_decisions = false_discovery_control(wcd_estimates, method='bh') <= 0.1
print(f"[Weighted Split] Empirical FDR: {false_discovery_rate(y=y_test, y_hat=wcd_decisions)}")
print(f"[Weighted Split] Empirical Power: {statistical_power(y=y_test, y_hat=wcd_decisions)}")
Output:
[Standard Split] Empirical FDR: 0.085
[Standard Split] Empirical Power: 0.97
[Standard Boot] Empirical FDR: 0.139
[Standard Boot] Empirical Power: 0.99
[Weighted Split] Empirical FDR: 0.031
[Weighted Split] Empirical Power: 0.95
While primarily designed for static (single-batch) applications, the library supports streaming scenarios through BatchGenerator()
and OnlineGenerator()
. For statistically valid FDR control in streaming data, use the optional onlineFDR
dependency, which implements appropriate statistical methods.
If you find this repository useful for your research, please cite the following papers:
@inproceedings{Hennhofer2024,
title = {{ Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors }},
author = {Hennhofer, Oliver and Preisach, Christine},
year = 2024,
month = {Dec},
booktitle = {2024 IEEE International Conference on Knowledge Graph (ICKG)},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
pages = {110--119},
doi = {10.1109/ICKG63256.2024.00022},
url = {https://doi.ieeecomputersociety.org/10.1109/ICKG63256.2024.00022}
}
@article{Bates2023,
title = {Testing for outliers with conformal p-values},
author = {Bates, Stephen and Candès, Emmanuel and Lei, Lihua and Romano, Yaniv and Sesia, Matteo},
year = 2023,
month = feb,
journal = {The Annals of Statistics},
publisher = {Institute of Mathematical Statistics},
volume = 51,
number = 1,
doi = {10.1214/22-aos2244},
issn = {0090-5364},
url = {http://dx.doi.org/10.1214/22-AOS2244}
}
@inproceedings{Jin2023,
title = {Model-free selective inference under covariate shift via weighted conformal p-values},
author = {Ying Jin and Emmanuel J. Cand{\`e}s},
year = 2023,
url = {https://api.semanticscholar.org/CorpusID:259950903}
}
For additional features, you might need optional dependencies:
pip install nonconform[data]
- Includes pyarrow for loading example data (via remote download)pip install nonconform[deep]
- Includes deep learning dependencies (PyTorch)pip install nonconform[fdr]
- Includes advanced FDR control methods (online-fdr)pip install nonconform[dev]
- Includes development tools (black, ruff, pre-commit)pip install nonconform[docs]
- Includes documentation building tools (sphinx, furo, etc.)pip install nonconform[all]
- Includes all optional dependencies
Please refer to the pyproject.toml for details.
Only anomaly estimators suitable for unsupervised one-class classification are supported. Since detectors are trained exclusively on normal data, threshold parameters are automatically set to minimal values.
Models that are currently supported include:
- Angle-Based Outlier Detection (ABOD)
- Autoencoder (AE)
- Cook's Distance (CD)
- Copula-based Outlier Detector (COPOD)
- Deep Isolation Forest (DIF)
- Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
- Gaussian Mixture Model (GMM)
- Histogram-based Outlier Detection (HBOS)
- Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
- Isolation Forest (IForest)
- Kernel Density Estimation (KDE)
- k-Nearest Neighbor (kNN)
- Kernel Principal Component Analysis (KPCA)
- Linear Model Deviation-base Outlier Detection (LMDD)
- Local Outlier Factor (LOF)
- Local Correlation Integral (LOCI)
- Lightweight Online Detector of Anomalies (LODA)
- Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
- GNN-based Anomaly Detection Method (LUNAR)
- Median Absolute Deviation (MAD)
- Minimum Covariance Determinant (MCD)
- One-Class SVM (OCSVM)
- Principal Component Analysis (PCA)
- Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
- Rotation-based Outlier Detection (ROD)
- Subspace Outlier Detection (SOD)
- Scalable Unsupervised Outlier Detection (SUOD)
Bug reporting: https://github.com/OliverHennhoefer/nonconform/issues