Releases: Lightning-AI/torchmetrics
First video and vertex metrics
The upcoming TorchMetrics v1.8.0 release introduces three flagship metrics, each designed to address critical evaluation needs in real-world applications.
Video Multi-Method Assessment Fusion (VMAF) brings a perceptual video-quality score that closely mirrors human judgment, powering streaming services such as Netflix and YouTube to optimize encoding ladders for consistent viewer experiences and enabling video-restoration labs to quantify improvements achieved by denoising and super-resolution algorithms.
Continuous Ranked Probability Score (CRPS) enables comprehensive evaluation of full predictive distributions rather than point estimates; meteorological centers leverage CRPS to benchmark probabilistic precipitation and temperature forecasts, improving public weather alerts, while energy companies apply it to assess uncertainty in load-demand predictions and refine grid management and trading strategies.
Lip Vertex Error (LVE) measures the discrepancy between predicted and ground-truth lip landmarks to quantify audio-visual synchronization. Localization studios use LVE to validate lip-sync accuracy during film dubbing, while AR/VR developers integrate it into avatar pipelines to ensure natural mouth movements in real-time virtual meetings and social experiences.
[1.8.0] - 2025-07-23
Added
- Added
VMAF
metric to new video domain (#2991) - Added
CRPS
in regression domain (#3024) - Added
aggregation_level
argument toDiceScore
(#3018) - Added support for
reduction="none"
toLearnedPerceptualImagePatchSimilarity
(#3053) - Added support single
str
input for functional interface ofbert_score
(#3056) - Enhance:
BERTScore
to evaluate hypotheses against multiple references (#3069) - Added
Lip Vertex Error (LVE)
in multimodal domain (#3090) - Added
antialias
argument toFID
metric (#3177) - Added
mixed
input format to segmentation metrics (#3176)
Changed
- Changed
data_range
argument inPSNR
metric to be a required argument (#3178)
Removed
- Removed
zero_division
argument fromDiceScore
(#3018)
Key Contributors
@nkaenzig, @rittik9, @simonreise, @SkafteNicki
New Contributors
- @lantiga made their first contribution in #3054
- @AlexVerine made their first contribution in #3057
- @ZhiyuanChen made their first contribution in #3059
- @ahmedhshahin made their first contribution in #3101
- @gratus907 made their first contribution in #3103
- @cyyever made their first contribution in #3118
- @Armannas made their first contribution in #3124
- @alifa98 made their first contribution in #3128
- @simonreise made their first contribution in #3176
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.7.0...v1.8.0
Minor patch release
[1.7.4] - 2025-07-04
Changed
- Improved numerical stability of pearson's correlation coefficient (#3152)
Fixed
- Fixed: Ignore zero and negative predictions in retrieval metrics (#3160)
- Fixed SSIM
dist_reduce_fx
whenreduction=None
for distributed training (#3162, #3166) - Fixed attribute error (#3154)
- Fixed incorrect shape in
_pearson_corrcoef_update
(#3168)
Key Contributors
@AymenKallala, @gratus907, @Isalia20, @rittik9
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.7.3...v1.7.4
Minor patch release
[1.7.3] - 2025-06-13
Fixed
- Fixed: ensure
WrapperMetric
resetswrapped_metric
state (#3123) - Fixed
top_k
inmulticlass_accuracy
(#3117) - Fixed compatibility to COCO format for
pycocotools
2.0.10 (#3131)
Key Contributors
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.7.2...v1.7.3
Minor patch release
[1.7.2] - 2025-05-27
Changed
- Enhance: improve performance of
_rank_data
(#3103)
Fixed
- Fixed
UnboundLocalError
inMatthewsCorrCoef
(#3059) - Fixed MIFID incorrectly converts inputs to
byte
dtype with custom encoders (#3064) - Fixed
ignore_index
inMultilabelExactMatch
(#3085) - Fixed: disable non-blocking on MPS (#3101)
Key Contributors
@ahmedhshahin, @gratus907, @rittik9, @ZhiyuanChen
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.7.1...v1.7.2
Minor patch release
[1.7.1] - 2025-04-06
Changed
- Enhance Support Adding a
MetricCollection
to AnotherMetricCollection
inadd_metrics
Function (#3032)
Fixed
- Fixed absent class
MeanIOU
(#2892) - Fixed detection IoU ignores predictions without ground truth (#3025)
- Fixed error raised in
MulticlassAccuracy
when top_k>1 (#3039)
Key Contributors
@Isalia20, @rittik9, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.7.0...v1.7.1
More image metrics
The upcoming release of TorchMetrics is set to deliver a range of innovative features and enhancements across multiple domains, further solidifying its position as a leading tool for machine learning metrics. In the image domain, significant additions include the ARNIQA and DeepImageStructureAndTextureSimilarity metrics, which provide new insights into image quality and similarity. Additionally, the CLIPScore metric now supports more models and processors, expanding its versatility in image-text alignment tasks.
Beyond image analysis, the regression package welcomes the JensenShannonDivergence metric, offering a powerful tool for comparing probability distributions. The clustering package also sees a notable update with the introduction of the ClusterAccuracy metric, which helps evaluate the performance of clustering algorithms more effectively.
In the realm of classification, the Equal Error Rate (EER) metric has been added, providing a crucial measure for assessing the performance of classification models, particularly in scenarios where false positives and false negatives have different costs. Furthermore, the MeanAveragePrecision metric now includes a functional interface, enhancing its usability and flexibility for users.
These updates collectively enhance the capabilities of TorchMetrics, making it an even more comprehensive and indispensable resource for machine learning practitioners and researchers.
[1.7.0] - 2025-03-20
Added
- Additions to image domain:
- Added
JensenShannonDivergence
metric to regression package (#2992) - Added
ClusterAccuracy
metric to cluster package (#2777) - Added
Equal Error Rate (EER)
to classification package (#3013) - Added functional interface to
MeanAveragePrecision
metric (#3011)
Changed
- Making
num_classes
optional forone-hot
inputs inMeanIoU
(#3012)
Removed
- Removed
Dice
from classification (#3017)
Fixed
- Fixed edge case in integration between class-wise wrapper and metric tracker (#3008)
- Fixed
IndexError
inMultiClassAccuracy
when usingtop_k
with single sample (#3021)
Key Contributors
@Isalia20, @LorenzoAgnolucci, @nathanpainchaud, @rittik9, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.6.0...v1.7.0
Minor patch release
[1.6.3] - 2024-03-13
Fixed
- Fixed logic in how metric states referencing is handled in
MetricCollection
(#2990) - Fixed integration between class-wise wrapper and metric tracker (#3004)
Key Contributors
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.6.2...v1.6.3
Minor patch release
[1.6.2] - 2024-02-28
Added
- Added
zero_division
argument toDiceScore
in segmentation package (#2860) - Added
cache_session
toDNSMOS
metric to control caching behavior (#2974) - Added
disable
option tonan_strategy
in basic aggregation metrics (#2943)
Changed
- Make
num_classes
optional for classification in case of micro averaging (#2841) - Enhance
Clip_Score
to calculate similarities between same modalities (#2875)
Fixed
- Fixed
DiceScore
when there is zero overlap between predictions and targets (#2860) - Fixed
MeanAveragePrecision
foraverage="micro"
when 0 label is not present (#2968) - Fixed corner-case in
PearsonCorrCoef
when input is constant (#2975) - Fixed
MetricCollection.update
gives identical results (#2944) - Fixed missing
kwargs
inPIT
metric for permutation wise mode (#2977) - Fixed multiple errors in the
_final_aggregation
function forPearsonCorrCoef
(#2980) - Fixed incorrect CLIP-IQA type hints (#2952)
Key Contributors
@baskrahmer, @czmrand, @rbedyakin, @rittik9, @SkafteNicki, @wooseopkim
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.6.1...v1.6.2
Minor patch release
[1.6.1] - 2024-12-25
Changed
- Enabled specifying weights path for FID (#2867)
- Delete
Device2Host
caused by comm with device and host (#2840)
Fixed
- Fixed plotting of multilabel confusion matrix (#2858)
- Fixed issue with shared state in metric collection when using dice score (#2848)
- Fixed
top_k
formulticlassf1score
with one-hot encoding (#2839) - Fixed slow calculations of classification metrics with MPS (#2876)
Key Contributors
@Isalia20, @nkaenzig, @podgorki, @rittik9, @yuvalkirstain, @zhaozheng09
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.6.0...v1.6.1
More metrics
The latest release of TorchMetrics introduces several significant enhancements and new features that will greatly benefit users across various domains. This update includes the addition of new metrics and methods that enhance the library's functionality and usability.
One of the key additions is the NISQA
audio metric, which provides advanced capabilities for evaluating audio quality. In the classification domain, the new LogAUC
and NegativePredictiveValue
metrics offer improved tools for assessing model performance, particularly in imbalanced datasets. For regression tasks, the NormalizedRootMeanSquaredError
metric has been introduced, providing a normalized measure of prediction accuracy that is less sensitive to outliers.
In the field of image segmentation, the new Dice
metric enhances the evaluation of segmentation models by providing a robust measure of overlap between predicted and ground truth masks. Additionally, the merge_state
method has been added to the Metric
class, allowing for more efficient state management and aggregation across multiple devices or processes.
Furthermore, this release includes support for the propagation of the autograd graph in Distributed Data-Parallel (DDP) settings, enabling more efficient and scalable training of models across multiple GPUs. These enhancements collectively make TorchMetrics a more powerful and versatile tool for machine learning practitioners, enabling more accurate and efficient model evaluation across a wide range of applications.
[1.6.0] - 2024-11-12
Added
- Added audio metric
NISQA
(#2792) - Added classification metric
LogAUC
(#2377) - Added classification metric
NegativePredictiveValue
(#2433) - Added regression metric
NormalizedRootMeanSquaredError
(#2442) - Added segmentation metric
Dice
(#2725) - Added method
merge_state
toMetric
(#2786) - Added support for propagation of the autograd graph in DDP setting (#2754)
Changed
- Changed naming and input order arguments in
KLDivergence
(#2800)
Deprecated
- Deprecated Dice from classification metrics (#2725)
Removed
- Changed minimum supported Pytorch version to 2.0 (#2671)
- Dropped support for Python 3.8 (#2827)
- Removed
num_outputs
inR2Score
(#2800)
Fixed
- Fixed segmentation
Dice
+GeneralizedDice
for 2d index tensors (#2832) - Fixed mixed results of
rouge_score
withaccumulate='best'
(#2830)
Key Contributors
@Borda, @cw-tan, @philgzl, @rittik9, @SkafteNicki
New Contributors since 1.5.0
- @bfolie made their first contribution in #2793
- @StalkerShurik made their first contribution in #2811
- @philgzl made their first contribution in #2792
- @cw-tan made their first contribution in #2754
Full Changelog: v1.5.0...v1.6.0