Panagiotis Koromilas, Efthymios Georgiou, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis A. Nicolaou, Yannis Panagakis
This repository contains the official PyTorch implementation of our paper "A Principled Framework for Multi-View Contrastive Learning".
We introduce MV-InfoNCE and MV-DHEL, two theoretically grounded objectives for multi-view contrastive learning that:
- ✅ Properly incorporate interactions across all views (not just pairs!)
- ✅ Scale to arbitrary number of views with consistent improvements
- ✅ Mitigate dimensionality collapse when using 5+ views
- ✅ Extends Multimodal Contrastive Learning beyond 2 modalities
Our methods consistently outperform existing approaches across all benchmarks:
|
|
Current approaches simply aggregate pairwise losses, leading to:
- Conflicting Objectives: Each view must satisfy multiple competing loss terms
- Missing Interactions: Critical view relationships are ignored
- Coupled Optimization: Alignment and uniformity interfere with each other
- Poor Scaling: Benefits diminish with more views
Our framework addresses all these limitations with a principled mathematical foundation.
We identify three principles that any proper multi-view contrastive loss must satisfy:
Principle | Description | PWE | PVC | MV-InfoNCE | MV-DHEL |
---|---|---|---|---|---|
P1: Simultaneous Alignment | All views aligned in one term | ❌ | ❌ | ✅ | ✅ |
P2: Accurate Energy | Complete pairwise interactions | ❌ | ❌ | ✅ | ✅ |
P3: One Term per Instance | Single optimization objective | ❌ | ❌ | ✅ | ✅ |
Bonus: Decoupled Optimization | Alignment ⊥ Uniformity | ❌ | ❌ | ❌ | ✅ |
MV-InfoNCE - Natural extension of InfoNCE to multiple views:
MV-DHEL - Decoupled optimization with superior efficiency:
Where:
i, j ∈ [M]
: instances in the batchl, l', m ∈ [N]
: different views of the dataU_{i,l}
: representation of instancei
in viewl
τ
: temperature parameter
Train with our state-of-the-art MV-DHEL objective:
# MV-DHEL (Recommended)
python3 train.py \
--config=configs/cifar_train_epochs200_bs256_05.yaml \
--multiplier=4 \
--loss=MVDHEL
# MV-InfoNCE
python3 train.py \
--config=configs/cifar_train_epochs200_bs256_05.yaml \
--multiplier=4 \
--loss=MVINFONCE
Baseline Methods
# PVC (Poly-View Contrastive)
python3 train.py --config=configs/cifar_train_epochs200_bs256_05.yaml --multiplier=4 --loss=PVCLoss
# PWE (Pairwise Aggregation)
python3 train.py --config=configs/cifar_train_epochs200_bs256_05.yaml --multiplier=4 --loss=NTXent --loss_pwe=True
# AVG (Average Views)
python3 train.py --config=configs/cifar_train_epochs200_bs256_05.yaml --multiplier=4 --loss=NTXent --loss_avg=True
# Linear evaluation
python3 train.py \
--config=configs/cifar_eval.yaml \
--checkpoint=path/to/checkpoint.pth
- Theoretical Foundation: We prove both methods optimize for the same asymptotic behavior as InfoNCE
- Computational Efficiency: MV-DHEL has O(M²N) complexity vs O(M²N²) for other methods
- Dimensionality Preservation: With 5+ views, MV-DHEL fully utilizes the embedding space
- Multimodal Ready: Extends naturally to 3+ modalities (validated on sentiment analysis)
If you find our work useful, please cite:
@article{koromilas2025principled,
title={A Principled Framework for Multi-View Contrastive Learning},
author={Koromilas, Panagiotis and Georgiou, Efthymios and Bouritsas, Giorgos and
Giannakopoulos, Theodoros and Nicolaou, Mihalis A. and Panagakis, Yannis},
journal={arXiv preprint arXiv:2507.06979},
year={2025}
}
This codebase builds upon SimCLR-PyTorch. We thank the authors for their excellent implementation.
For questions and discussions:
- Open an issue for bug reports or for general questions
- Email: [email protected]