This repository contains MATLAB and Fortran implementations of Optimal Bias Robust Estimator (OBRE) algorithms for extreme value analysis. The software is designed for robust parameter estimation of the Generalized Pareto Distribution (GPD), which is fundamental in modeling extreme events such as extreme wind speeds, floods, financial losses, and other rare but impactful phenomena.
Extreme Value Theory (EVT) provides the mathematical foundation for analyzing rare events that occur at the extremes of probability distributions. The theory is built on asymptotic results that describe the limiting behavior of extreme order statistics.
The Generalized Pareto Distribution (GPD) is central to the Peaks Over Threshold (POT) method in extreme value analysis. According to the Pickands-Balkema-de Haan theorem, for a wide class of underlying distributions, the conditional distribution of excesses over a high threshold converges to a GPD.
The GPD is defined as:
F(x) = 1 - [1 + ξ(x-μ)/σ]^(-1/ξ) for ξ ≠ 0
F(x) = 1 - exp[-(x-μ)/σ] for ξ = 0
where:
- μ: Location parameter (threshold)
- σ > 0: Scale parameter (α in this implementation)
- ξ: Shape parameter (β in this implementation)
This implementation provides three parameter estimation approaches:
The Method of Moments estimates parameters by equating sample moments to theoretical moments:
α̂ = (μ̂²/σ̂² + 1) × μ̂/2
β̂ = (μ̂²/σ̂² - 1)/2
where μ̂ and σ̂² are the sample mean and variance of excesses.
PWM estimators, introduced by Hosking and Wallis (1987), are based on probability weighted moments:
M₀ = E[X]
M₁ = E[X × F(X)]
The PWM estimators are:
α̂ = 2 × M₀ × M₁ / (M₀ - 2M₁)
β̂ = M₀/(M₀ - 2M₁) - 2
OBRE methodology, developed by Ronchetti (1982, 1987), provides robust parameter estimation that is less sensitive to outliers while maintaining high efficiency under the assumed model.
Ronchetti's Algorithm consists of five steps:
- Initialization: Start with PWM estimates
- Fisher Information: Compute the Fisher information matrix J
- Eigendecomposition: Find eigenvalues and eigenvectors of J
- Iterative Refinement: Update parameter estimates using robust score functions
- Convergence: Iterate until convergence criteria are met
The OBRE approach minimizes the maximum asymptotic bias over a neighborhood of the assumed model, making it robust against model misspecification and outliers.
The primary application domain is structural wind engineering, where accurate estimation of extreme wind speeds is crucial for:
- Building design and safety assessment
- Determination of design wind loads
- Estimation of return period wind speeds (VREF)
The return period formula implemented is:
VREF = μ + (α/β) × [1 - (n/(M×T))^(-β)]
where:
- μ: Threshold wind speed
- n: Number of exceedances
- M: Number of years of record
- T: Return period (years)
- Hydrology: Flood frequency analysis
- Finance: Risk management and Value-at-Risk estimation
- Insurance: Catastrophic loss modeling
- Seismology: Earthquake magnitude analysis
The MATLAB version provides a complete implementation with the following key functions:
obre.m
: Main function implementing the complete OBRE algorithmmeth1.m
: Method of Moments parameter estimationmeth2.m
: Probability Weighted Moments parameter estimationstep345.m
: Steps 3-5 of Ronchetti's algorithm
fishinfo.m
: Fisher information matrix calculationjacobi.m
: Eigenvalue decompositiongoodoffit.m
: Goodness-of-fit testing (Anderson-Darling, Cramér-von Mises)quangp.m
: Quantile estimation for GPDweight.m
: Robust weight function calculationcovmat.m
: Asymptotic covariance matrix estimation
readinfo.m
: Data preprocessing and structure initializationvmean.m
,vstdev.m
: Statistical calculationsdgsint.m
: Numerical integrationscore1.m
,score2.m
: Score function calculations
The Fortran version provides high-performance implementation suitable for intensive computational applications (see fortran/
directory).
% Basic OBRE analysis with default parameters
[alpha, beta, nex, threshold, vref_obre, vref_mom, vref_pwm] = obre(25.8, 50);
% OBRE analysis with custom parameters
config = struct(...
'incr', 0.2, ... % Threshold increment
'rep', 5, ... % Number of threshold repetitions
'c', 1.5, ... % Bound on influence function
'epsiln', 0.001, ... % Convergence tolerance
'factor', 10.0 ... % Integration speed factor
);
[alpha, beta, nex, threshold, vref_obre, vref_mom, vref_pwm, weights, errors] = ...
obre(25.8, 50, config);
initialthreshold
: Initial threshold value for analysisreturnperiod
: Return period for VREF calculation (years)DATINIT
(optional): Configuration structure with fields:incr
: Threshold increment for sensitivity analysisrep
: Number of threshold repetitionsc
: Bound on the influence function (robustness parameter)epsiln
: Convergence tolerance for iterative algorithmfactor
: Integration factor (1.0 = slow, 50.0 = fast)sim
: Number of simulations for goodness-of-fit testing
alpha
: Scale parameter estimates (vector)beta
: Shape parameter estimates (vector)nex
: Number of exceedances for each thresholdthreshold
: Final threshold values usedvref_obre
: VREF estimates using OBREvref_mom
: VREF estimates using Method of Momentsvref_pwm
: VREF estimates using PWMweights
: Robust weights assigned to observationserrors
: Structure containing standard errors and confidence intervals
The software expects MATLAB .mat
files containing:
- Peak values: Vector of observed extreme values
- Independence: Data should be processed for temporal independence
- Stationarity: Underlying process should be stationary
For wind speed analysis, data preprocessing typically includes:
- Peak extraction: Identify local maxima above threshold
- Independence: Ensure temporal independence between peaks
- Stationarity: Verify stationarity of the underlying process
The OBRE estimators satisfy:
- Breakdown Point: 1/n (same as maximum likelihood)
- Bounded Influence: Influence function is bounded by parameter
c
- High Efficiency: Asymptotic efficiency close to maximum likelihood under the model
The implementation provides several goodness-of-fit tests:
A² = -n - (1/n) × Σᵢ[(2i-1) × ln(uᵢ) + (2n+1-2i) × ln(1-uᵢ)]
W² = (1/12n) + Σᵢ[uᵢ - (2i-1)/(2n)]²
where uᵢ are the probability integral transforms.
-
Ronchetti, E. (1982). Robust testing in linear models. PhD Thesis, ETH Zurich.
-
Ronchetti, E. (1987). Robustness aspects of model choice. Statistica Sinica, 7, 327-338.
-
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3, 119-131.
-
Balkema, A.A. and de Haan, L. (1974). Residual life time at great age. Annals of Probability, 2, 792-804.
-
Hosking, J.R.M. and Wallis, J.R. (1987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics, 29, 339-349.
-
Davison, A.C. and Smith, R.L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society, Series B, 52, 393-442.
-
Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag.
-
Simiu, E. and Heckert, N.A. (1996). Extreme wind distribution tails: a peaks over threshold approach. Journal of Structural Engineering, 122, 539-547.
-
Cook, N.J. (1985). The Designer's Guide to Wind Loading of Building Structures. Butterworths.
-
Huber, P.J. and Ronchetti, E. (2009). Robust Statistics, 2nd Edition. John Wiley & Sons.
-
Maronna, R.A., Martin, R.D., and Yohai, V.J. (2006). Robust Statistics: Theory and Methods. John Wiley & Sons.
- Original Fortran Implementation: Joanna E. Mills (TUNS, 1996), modified by D.J. Dupuis (1997) and Andrew P. Wilson (1999)
- MATLAB Port: Igor Moiseev (2011)
- Version: 0.1 for MATLAB
This software is distributed under the GNU General Public License v3.0:
OBRE for Matlab
Copyright (C) 2011, Igor Moiseev
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
- MATLAB: R2008a or later
- Fortran: Standard Fortran 90/95 compiler
- Memory: Depends on dataset size (typically < 1GB for most applications)
- Storage: Minimal (algorithms are computationally intensive, not storage intensive)
-
Convergence Problems:
- Reduce
epsiln
tolerance - Increase
factor
for faster integration - Check data for extreme outliers
- Reduce
-
Negative Scale Parameters:
- Indicates poor fit or inappropriate threshold
- Try different threshold values
- Verify data preprocessing
-
High Shape Parameter (β > 1):
- May indicate heavy-tailed distribution
- Check for data quality issues
- Consider alternative distributions
- Use
factor
parameter to balance accuracy vs. speed - For large datasets, consider parallel processing
- Pre-process data to remove obvious outliers
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
- GPU acceleration for large datasets
- Additional robust estimators (M-estimators, S-estimators)
- Automated threshold selection methods
- Extended goodness-of-fit testing suite
If you use this software in your research, please cite:
@misc{moiseev2011obre,
title={OBRE: Optimal Bias Robust Estimator Algorithms},
author={Moiseev, Igor},
year={2011},
doi={10.5281/zenodo.48265},
url={https://github.com/moiseevigor/obre}
}
For questions, bug reports, or collaboration inquiries, please open an issue on GitHub or contact the maintainers.
This software implements state-of-the-art robust statistical methods for extreme value analysis, providing reliable parameter estimation in the presence of model uncertainty and outliers.