Skip to content

Commit f5a9c0b

Browse files
CopilotAnselmoo
andauthored
feat: ✨ add common gradient-based and classical optimizers (#19)
* Initial plan * Initial exploration and plan for gradient-based optimizers Co-authored-by: Anselmoo <[email protected]> * Implement gradient-based and classical optimizers Co-authored-by: Anselmoo <[email protected]> * Fix bounds handling in CG and final validation of optimizers Co-authored-by: Anselmoo <[email protected]> * Fix pre-commit issues: formatting, type annotations, and linting * Update README.md with new gradient-based and classical optimizers Co-authored-by: Anselmoo <[email protected]> * chore: 📝 trim trailing whitespace --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Anselmoo <[email protected]> Co-authored-by: Anselm Hahn <[email protected]>
1 parent 855d475 commit f5a9c0b

18 files changed

+2216
-6
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ repos:
1212
rev: "v0.3.4"
1313
hooks:
1414
- id: ruff
15-
language_version: python3.10
15+
language_version: python3.12
1616
types_or: [ python, pyi, jupyter ]
1717
- id: ruff-format
18-
language_version: python3.10
18+
language_version: python3.12

README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,26 @@ print(f"Best solution: {best_solution}")
4444
print(f"Best fitness: {best_fitness}")
4545
```
4646

47+
You can also use the new gradient-based optimizers:
48+
49+
```python
50+
from opt.stochastic_gradient_descent import SGD
51+
from opt.adamw import AdamW
52+
from opt.bfgs import BFGS
53+
54+
# Gradient-based optimization
55+
sgd = SGD(func=shifted_ackley, lower_bound=-12.768, upper_bound=12.768, dim=2, learning_rate=0.01)
56+
best_solution, best_fitness = sgd.search()
57+
58+
# Adam variant with weight decay
59+
adamw = AdamW(func=shifted_ackley, lower_bound=-12.768, upper_bound=12.768, dim=2, weight_decay=0.01)
60+
best_solution, best_fitness = adamw.search()
61+
62+
# Quasi-Newton method
63+
bfgs = BFGS(func=shifted_ackley, lower_bound=-12.768, upper_bound=12.768, dim=2, num_restarts=10)
64+
best_solution, best_fitness = bfgs.search()
65+
```
66+
4767
## Current Implemented Optimizer
4868

4969
The current version of Useful Optimizer includes a wide range of optimization algorithms, each implemented as a separate module. Here's a brief overview of the implemented optimizers:
@@ -52,6 +72,12 @@ Sure, here's a brief description of each optimizer:
5272

5373
- **Adadelta, Adagrad, and Adaptive Moment Estimation**: These are gradient-based optimization algorithms commonly used in machine learning and deep learning.
5474

75+
- **AdaMax**: This is an Adam variant that uses the infinity norm for the second moment estimation, making it more robust to large gradients.
76+
77+
- **AdamW**: This is an Adam variant with decoupled weight decay that provides better regularization and improved generalization in machine learning.
78+
79+
- **AMSGrad**: This is an Adam variant with non-decreasing second moment estimates that addresses convergence issues in original Adam.
80+
5581
- **Ant Colony Optimization**: This is a nature-inspired algorithm that mimics the behavior of ants to solve optimization problems.
5682

5783
- **Artificial Fish Swarm Algorithm**: This algorithm simulates the behavior of fish in nature to perform global optimization.
@@ -62,12 +88,16 @@ Sure, here's a brief description of each optimizer:
6288

6389
- **Bee Algorithm**: This is a population-based search algorithm inspired by the food foraging behavior of honey bee colonies.
6490

91+
- **BFGS (Broyden-Fletcher-Goldfarb-Shanno)**: This is a quasi-Newton method that approximates the inverse Hessian matrix for efficient second-order optimization.
92+
6593
- **Cat Swarm Optimization**: This algorithm is based on the behavior of cats and distinguishes between two forms of behavior in cats: seeking mode and tracing mode.
6694

6795
- **CMA-ES (Covariance Matrix Adaptation Evolution Strategy)**: This is an evolutionary algorithm for difficult non-linear non-convex optimization problems in continuous domain.
6896

6997
- **Colliding Bodies Optimization**: This is a physics-inspired optimization method, based on the collision and explosion of bodies.
7098

99+
- **Conjugate Gradient**: This is an efficient iterative method for solving systems of linear equations and optimization problems, particularly effective for quadratic functions.
100+
71101
- **Cross Entropy Method**: This is a Monte Carlo method for importance sampling and optimization.
72102

73103
- **Cuckoo Search**: This is a nature-inspired metaheuristic optimization algorithm, which is based on the obligate brood parasitism of some cuckoo species.
@@ -96,14 +126,26 @@ Sure, here's a brief description of each optimizer:
96126

97127
- **Imperialist Competitive Algorithm**: This is a socio-politically motivated global search strategy, which is based on the imperialistic competition.
98128

129+
- **L-BFGS (Limited-memory BFGS)**: This is a limited-memory version of BFGS that is suitable for large-scale optimization problems where storing the full Hessian approximation is impractical.
130+
99131
- **Linear Discriminant Analysis**: This is a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events.
100132

133+
- **Nadam**: This is Nesterov-accelerated Adam that combines the benefits of Adam with Nesterov momentum for improved convergence.
134+
135+
- **Nelder-Mead**: This is a derivative-free simplex-based optimization method that is particularly useful for non-differentiable and noisy functions.
136+
137+
- **Nesterov Accelerated Gradient**: This is an accelerated gradient method that uses lookahead momentum to achieve better convergence rates than standard gradient descent.
138+
101139
- **Particle Filter**: This is a statistical filter technique used to estimate the state of a system where the state model and the measurements are both nonlinear.
102140

103141
- **Particle Swarm Optimization**: This is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality, mimicking the social behavior of bird flocking or fish schooling.
104142

105143
- **Parzen Tree Estimator**: This is a non-parametric method to estimate the density function of random variables.
106144

145+
- **Powell's Method**: This is a derivative-free optimization algorithm that uses conjugate directions to minimize functions without requiring gradient information.
146+
147+
- **RMSprop**: This is an adaptive learning rate optimization algorithm that uses a moving average of squared gradients to normalize the gradient.
148+
107149
- **Shuffled Frog Leaping Algorithm**: This is a metaheuristic optimization algorithm inspired by the memetic evolution of a group of frogs when searching for food.
108150

109151
- **Simulated Annealing**: This is a probabilistic technique for approximating the global optimum of a given function, mimicking the process of heating a material and then slowly lowering the temperature to decrease defects, thus minimizing the system energy.
@@ -116,10 +158,16 @@ Sure, here's a brief description of each optimizer:
116158

117159
- **Stochastic Fractal Search**: This is a metaheuristic search algorithm inspired by the natural phenomenon of fractal shapes and Brownian motion.
118160

161+
- **Stochastic Gradient Descent (SGD)**: This is a fundamental gradient-based optimization algorithm that updates parameters in the direction opposite to the gradient of the objective function.
162+
163+
- **SGD with Momentum**: This is SGD enhanced with momentum that accelerates convergence and helps navigate through local minima by accumulating velocity in consistent gradient directions.
164+
119165
- **Successive Linear Programming**: This is an optimization method for nonlinear optimization problems.
120166

121167
- **Tabu Search**: This is a metaheuristic search method employing local search methods used for mathematical optimization.
122168

169+
- **Trust Region**: This is a robust optimization method that iteratively solves optimization problems within a region where a model function is trusted to be an adequate representation.
170+
123171
- **Variable Depth Search**: This is a search algorithm that explores the search space by variable-depth first search and backtracking.
124172

125173
- **Variable Neighbourhood Search**: This is a metaheuristic search method for discrete optimization problems.

opt/adamax.py

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
"""AdaMax Optimizer.
2+
3+
This module implements the AdaMax optimization algorithm. AdaMax is a variant of Adam
4+
that uses the infinity norm instead of the L2 norm for the second moment estimate.
5+
This makes it less sensitive to outliers in gradients and can be more stable in some cases.
6+
7+
AdaMax performs the following update rule:
8+
m = beta1 * m + (1 - beta1) * gradient
9+
u = max(beta2 * u, |gradient|)
10+
x = x - (learning_rate / (1 - beta1^t)) * (m / u)
11+
12+
where:
13+
- x: current solution
14+
- m: first moment estimate (exponential moving average of gradients)
15+
- u: second moment estimate (exponential moving average of infinity norm of gradients)
16+
- learning_rate: step size for parameter updates
17+
- beta1, beta2: exponential decay rates for moment estimates
18+
- t: time step
19+
20+
Example:
21+
optimizer = AdaMax(func=objective_function, learning_rate=0.002, beta1=0.9, beta2=0.999,
22+
lower_bound=-5, upper_bound=5, dim=2)
23+
best_solution, best_fitness = optimizer.search()
24+
25+
Attributes:
26+
func (Callable): The objective function to optimize.
27+
learning_rate (float): The learning rate for the optimization.
28+
beta1 (float): Exponential decay rate for first moment estimates.
29+
beta2 (float): Exponential decay rate for second moment estimates.
30+
epsilon (float): Small constant for numerical stability.
31+
32+
Methods:
33+
search(): Perform the AdaMax optimization.
34+
"""
35+
36+
from __future__ import annotations
37+
38+
from typing import TYPE_CHECKING
39+
40+
import numpy as np
41+
42+
from scipy.optimize import approx_fprime
43+
44+
from opt.abstract_optimizer import AbstractOptimizer
45+
from opt.benchmark.functions import shifted_ackley
46+
47+
48+
if TYPE_CHECKING:
49+
from collections.abc import Callable
50+
51+
from numpy import ndarray
52+
53+
54+
class AdaMax(AbstractOptimizer):
55+
"""AdaMax optimizer implementation.
56+
57+
Args:
58+
func (Callable[[ndarray], float]): The objective function to be optimized.
59+
lower_bound (float): The lower bound of the search space.
60+
upper_bound (float): The upper bound of the search space.
61+
dim (int): The dimensionality of the search space.
62+
max_iter (int, optional): The maximum number of iterations. Defaults to 1000.
63+
learning_rate (float, optional): The learning rate. Defaults to 0.002.
64+
beta1 (float, optional): Exponential decay rate for first moment estimates. Defaults to 0.9.
65+
beta2 (float, optional): Exponential decay rate for second moment estimates. Defaults to 0.999.
66+
epsilon (float, optional): Small constant for numerical stability. Defaults to 1e-8.
67+
seed (int | None, optional): The seed value for random number generation. Defaults to None.
68+
"""
69+
70+
def __init__(
71+
self,
72+
func: Callable[[ndarray], float],
73+
lower_bound: float,
74+
upper_bound: float,
75+
dim: int,
76+
max_iter: int = 1000,
77+
learning_rate: float = 0.002,
78+
beta1: float = 0.9,
79+
beta2: float = 0.999,
80+
epsilon: float = 1e-8,
81+
seed: int | None = None,
82+
) -> None:
83+
"""Initialize the AdaMax optimizer."""
84+
super().__init__(
85+
func=func,
86+
lower_bound=lower_bound,
87+
upper_bound=upper_bound,
88+
dim=dim,
89+
max_iter=max_iter,
90+
seed=seed,
91+
)
92+
self.learning_rate = learning_rate
93+
self.beta1 = beta1
94+
self.beta2 = beta2
95+
self.epsilon = epsilon
96+
97+
def search(self) -> tuple[np.ndarray, float]:
98+
"""Perform the AdaMax optimization search.
99+
100+
Returns:
101+
tuple[np.ndarray, float]: A tuple containing the best solution found and its fitness value.
102+
"""
103+
# Initialize solution randomly
104+
best_solution = np.random.default_rng(self.seed).uniform(
105+
self.lower_bound, self.upper_bound, self.dim
106+
)
107+
best_fitness = self.func(best_solution)
108+
109+
current_solution = best_solution.copy()
110+
m = np.zeros(self.dim) # First moment estimate
111+
u = np.zeros(self.dim) # Infinity norm-based second moment estimate
112+
113+
for t in range(1, self.max_iter + 1):
114+
# Compute gradient at current position
115+
gradient = self._compute_gradient(current_solution)
116+
117+
# Update biased first moment estimate
118+
m = self.beta1 * m + (1 - self.beta1) * gradient
119+
120+
# Update the exponentially weighted infinity norm
121+
u = np.maximum(self.beta2 * u, np.abs(gradient))
122+
123+
# Compute bias-corrected first moment estimate
124+
bias_correction = 1 - np.power(self.beta1, t)
125+
126+
# Update solution using AdaMax rule
127+
current_solution = current_solution - (
128+
self.learning_rate / bias_correction
129+
) * (m / (u + self.epsilon))
130+
131+
# Apply bounds
132+
current_solution = np.clip(
133+
current_solution, self.lower_bound, self.upper_bound
134+
)
135+
136+
# Evaluate fitness
137+
current_fitness = self.func(current_solution)
138+
139+
# Update best solution if improved
140+
if current_fitness < best_fitness:
141+
best_solution = current_solution.copy()
142+
best_fitness = current_fitness
143+
144+
return best_solution, best_fitness
145+
146+
def _compute_gradient(self, x: np.ndarray) -> np.ndarray:
147+
"""Compute the gradient of the objective function at a given point.
148+
149+
Args:
150+
x (np.ndarray): The point at which to compute the gradient.
151+
152+
Returns:
153+
np.ndarray: The gradient vector.
154+
"""
155+
epsilon = np.sqrt(np.finfo(float).eps)
156+
return approx_fprime(x, self.func, epsilon)
157+
158+
159+
if __name__ == "__main__":
160+
optimizer = AdaMax(
161+
func=shifted_ackley, lower_bound=-2.768, upper_bound=+2.768, dim=2
162+
)
163+
best_solution, best_fitness = optimizer.search()
164+
print(f"Best solution: {best_solution}")
165+
print(f"Best fitness: {best_fitness}")

0 commit comments

Comments
 (0)