Anselmoo
diff --git a/‎.pre-commit-config.yaml
Lines changed: 2 additions & 2 deletions b/‎.pre-commit-config.yaml
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md
Lines changed: 48 additions & 0 deletions b/‎README.md
Lines changed: 48 additions & 0 deletions
diff --git a/‎opt/adamax.py
Lines changed: 165 additions & 0 deletions b/‎opt/adamax.py
Lines changed: 165 additions & 0 deletions
@@ -12,7 +12,7 @@ repos:
     rev: "v0.3.4"
     hooks:
       - id: ruff
-        language_version: python3.10
+        language_version: python3.12
         types_or: [ python, pyi, jupyter ]
       - id: ruff-format
-        language_version: python3.10
+        language_version: python3.12
@@ -44,6 +44,26 @@ print(f"Best solution: {best_solution}")
 print(f"Best fitness: {best_fitness}")
 ```
 
+You can also use the new gradient-based optimizers:
+
+```python
+from opt.stochastic_gradient_descent import SGD
+from opt.adamw import AdamW
+from opt.bfgs import BFGS
+
+# Gradient-based optimization
+sgd = SGD(func=shifted_ackley, lower_bound=-12.768, upper_bound=12.768, dim=2, learning_rate=0.01)
+best_solution, best_fitness = sgd.search()
+
+# Adam variant with weight decay
+adamw = AdamW(func=shifted_ackley, lower_bound=-12.768, upper_bound=12.768, dim=2, weight_decay=0.01)
+best_solution, best_fitness = adamw.search()
+
+# Quasi-Newton method
+bfgs = BFGS(func=shifted_ackley, lower_bound=-12.768, upper_bound=12.768, dim=2, num_restarts=10)
+best_solution, best_fitness = bfgs.search()
+```
+
 ## Current Implemented Optimizer
 
 The current version of Useful Optimizer includes a wide range of optimization algorithms, each implemented as a separate module. Here's a brief overview of the implemented optimizers:
@@ -52,6 +72,12 @@ Sure, here's a brief description of each optimizer:
 
 - **Adadelta, Adagrad, and Adaptive Moment Estimation**: These are gradient-based optimization algorithms commonly used in machine learning and deep learning.
 
+- **AdaMax**: This is an Adam variant that uses the infinity norm for the second moment estimation, making it more robust to large gradients.
+
+- **AdamW**: This is an Adam variant with decoupled weight decay that provides better regularization and improved generalization in machine learning.
+
+- **AMSGrad**: This is an Adam variant with non-decreasing second moment estimates that addresses convergence issues in original Adam.
+
 - **Ant Colony Optimization**: This is a nature-inspired algorithm that mimics the behavior of ants to solve optimization problems.
 
 - **Artificial Fish Swarm Algorithm**: This algorithm simulates the behavior of fish in nature to perform global optimization.
@@ -62,12 +88,16 @@ Sure, here's a brief description of each optimizer:
 
 - **Bee Algorithm**: This is a population-based search algorithm inspired by the food foraging behavior of honey bee colonies.
 
+- **BFGS (Broyden-Fletcher-Goldfarb-Shanno)**: This is a quasi-Newton method that approximates the inverse Hessian matrix for efficient second-order optimization.
+
 - **Cat Swarm Optimization**: This algorithm is based on the behavior of cats and distinguishes between two forms of behavior in cats: seeking mode and tracing mode.
 
 - **CMA-ES (Covariance Matrix Adaptation Evolution Strategy)**: This is an evolutionary algorithm for difficult non-linear non-convex optimization problems in continuous domain.
 
 - **Colliding Bodies Optimization**: This is a physics-inspired optimization method, based on the collision and explosion of bodies.
 
+- **Conjugate Gradient**: This is an efficient iterative method for solving systems of linear equations and optimization problems, particularly effective for quadratic functions.
+
 - **Cross Entropy Method**: This is a Monte Carlo method for importance sampling and optimization.
 
 - **Cuckoo Search**: This is a nature-inspired metaheuristic optimization algorithm, which is based on the obligate brood parasitism of some cuckoo species.
@@ -96,14 +126,26 @@ Sure, here's a brief description of each optimizer:
 
 - **Imperialist Competitive Algorithm**: This is a socio-politically motivated global search strategy, which is based on the imperialistic competition.
 
+- **L-BFGS (Limited-memory BFGS)**: This is a limited-memory version of BFGS that is suitable for large-scale optimization problems where storing the full Hessian approximation is impractical.
+
 - **Linear Discriminant Analysis**: This is a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events.
 
+- **Nadam**: This is Nesterov-accelerated Adam that combines the benefits of Adam with Nesterov momentum for improved convergence.
+
+- **Nelder-Mead**: This is a derivative-free simplex-based optimization method that is particularly useful for non-differentiable and noisy functions.
+
+- **Nesterov Accelerated Gradient**: This is an accelerated gradient method that uses lookahead momentum to achieve better convergence rates than standard gradient descent.
+
 - **Particle Filter**: This is a statistical filter technique used to estimate the state of a system where the state model and the measurements are both nonlinear.
 
 - **Particle Swarm Optimization**: This is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality, mimicking the social behavior of bird flocking or fish schooling.
 
 - **Parzen Tree Estimator**: This is a non-parametric method to estimate the density function of random variables.
 
+- **Powell's Method**: This is a derivative-free optimization algorithm that uses conjugate directions to minimize functions without requiring gradient information.
+
+- **RMSprop**: This is an adaptive learning rate optimization algorithm that uses a moving average of squared gradients to normalize the gradient.
+
 - **Shuffled Frog Leaping Algorithm**: This is a metaheuristic optimization algorithm inspired by the memetic evolution of a group of frogs when searching for food.
 
 - **Simulated Annealing**: This is a probabilistic technique for approximating the global optimum of a given function, mimicking the process of heating a material and then slowly lowering the temperature to decrease defects, thus minimizing the system energy.
@@ -116,10 +158,16 @@ Sure, here's a brief description of each optimizer:
 
 - **Stochastic Fractal Search**: This is a metaheuristic search algorithm inspired by the natural phenomenon of fractal shapes and Brownian motion.
 
+- **Stochastic Gradient Descent (SGD)**: This is a fundamental gradient-based optimization algorithm that updates parameters in the direction opposite to the gradient of the objective function.
+
+- **SGD with Momentum**: This is SGD enhanced with momentum that accelerates convergence and helps navigate through local minima by accumulating velocity in consistent gradient directions.
+
 - **Successive Linear Programming**: This is an optimization method for nonlinear optimization problems.
 
 - **Tabu Search**: This is a metaheuristic search method employing local search methods used for mathematical optimization.
 
+- **Trust Region**: This is a robust optimization method that iteratively solves optimization problems within a region where a model function is trusted to be an adequate representation.
+
 - **Variable Depth Search**: This is a search algorithm that explores the search space by variable-depth first search and backtracking.
 
 - **Variable Neighbourhood Search**: This is a metaheuristic search method for discrete optimization problems.
 
@@ -0,0 +1,165 @@
+"""AdaMax Optimizer.
+
+This module implements the AdaMax optimization algorithm. AdaMax is a variant of Adam
+that uses the infinity norm instead of the L2 norm for the second moment estimate.
+This makes it less sensitive to outliers in gradients and can be more stable in some cases.
+
+AdaMax performs the following update rule:
+    m = beta1 * m + (1 - beta1) * gradient
+    u = max(beta2 * u, |gradient|)
+    x = x - (learning_rate / (1 - beta1^t)) * (m / u)
+
+where:
+    - x: current solution
+    - m: first moment estimate (exponential moving average of gradients)
+    - u: second moment estimate (exponential moving average of infinity norm of gradients)
+    - learning_rate: step size for parameter updates
+    - beta1, beta2: exponential decay rates for moment estimates
+    - t: time step
+
+Example:
+    optimizer = AdaMax(func=objective_function, learning_rate=0.002, beta1=0.9, beta2=0.999,
+                      lower_bound=-5, upper_bound=5, dim=2)
+    best_solution, best_fitness = optimizer.search()
+
+Attributes:
+    func (Callable): The objective function to optimize.
+    learning_rate (float): The learning rate for the optimization.
+    beta1 (float): Exponential decay rate for first moment estimates.
+    beta2 (float): Exponential decay rate for second moment estimates.
+    epsilon (float): Small constant for numerical stability.
+
+Methods:
+    search(): Perform the AdaMax optimization.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+import numpy as np
+
+from scipy.optimize import approx_fprime
+
+from opt.abstract_optimizer import AbstractOptimizer
+from opt.benchmark.functions import shifted_ackley
+
+
+if TYPE_CHECKING:
+    from collections.abc import Callable
+
+    from numpy import ndarray
+
+
+class AdaMax(AbstractOptimizer):
+    """AdaMax optimizer implementation.
+
+    Args:
+        func (Callable[[ndarray], float]): The objective function to be optimized.
+        lower_bound (float): The lower bound of the search space.
+        upper_bound (float): The upper bound of the search space.
+        dim (int): The dimensionality of the search space.
+        max_iter (int, optional): The maximum number of iterations. Defaults to 1000.
+        learning_rate (float, optional): The learning rate. Defaults to 0.002.
+        beta1 (float, optional): Exponential decay rate for first moment estimates. Defaults to 0.9.
+        beta2 (float, optional): Exponential decay rate for second moment estimates. Defaults to 0.999.
+        epsilon (float, optional): Small constant for numerical stability. Defaults to 1e-8.
+        seed (int | None, optional): The seed value for random number generation. Defaults to None.
+    """
+
+    def __init__(
+        self,
+        func: Callable[[ndarray], float],
+        lower_bound: float,
+        upper_bound: float,
+        dim: int,
+        max_iter: int = 1000,
+        learning_rate: float = 0.002,
+        beta1: float = 0.9,
+        beta2: float = 0.999,
+        epsilon: float = 1e-8,
+        seed: int | None = None,
+    ) -> None:
+        """Initialize the AdaMax optimizer."""
+        super().__init__(
+            func=func,
+            lower_bound=lower_bound,
+            upper_bound=upper_bound,
+            dim=dim,
+            max_iter=max_iter,
+            seed=seed,
+        )
+        self.learning_rate = learning_rate
+        self.beta1 = beta1
+        self.beta2 = beta2
+        self.epsilon = epsilon
+
+    def search(self) -> tuple[np.ndarray, float]:
+        """Perform the AdaMax optimization search.
+
+        Returns:
+            tuple[np.ndarray, float]: A tuple containing the best solution found and its fitness value.
+        """
+        # Initialize solution randomly
+        best_solution = np.random.default_rng(self.seed).uniform(
+            self.lower_bound, self.upper_bound, self.dim
+        )
+        best_fitness = self.func(best_solution)
+
+        current_solution = best_solution.copy()
+        m = np.zeros(self.dim)  # First moment estimate
+        u = np.zeros(self.dim)  # Infinity norm-based second moment estimate
+
+        for t in range(1, self.max_iter + 1):
+            # Compute gradient at current position
+            gradient = self._compute_gradient(current_solution)
+
+            # Update biased first moment estimate
+            m = self.beta1 * m + (1 - self.beta1) * gradient
+
+            # Update the exponentially weighted infinity norm
+            u = np.maximum(self.beta2 * u, np.abs(gradient))
+
+            # Compute bias-corrected first moment estimate
+            bias_correction = 1 - np.power(self.beta1, t)
+
+            # Update solution using AdaMax rule
+            current_solution = current_solution - (
+                self.learning_rate / bias_correction
+            ) * (m / (u + self.epsilon))
+
+            # Apply bounds
+            current_solution = np.clip(
+                current_solution, self.lower_bound, self.upper_bound
+            )
+
+            # Evaluate fitness
+            current_fitness = self.func(current_solution)
+
+            # Update best solution if improved
+            if current_fitness < best_fitness:
+                best_solution = current_solution.copy()
+                best_fitness = current_fitness
+
+        return best_solution, best_fitness
+
+    def _compute_gradient(self, x: np.ndarray) -> np.ndarray:
+        """Compute the gradient of the objective function at a given point.
+
+        Args:
+            x (np.ndarray): The point at which to compute the gradient.
+
+        Returns:
+            np.ndarray: The gradient vector.
+        """
+        epsilon = np.sqrt(np.finfo(float).eps)
+        return approx_fprime(x, self.func, epsilon)
+
+
+if __name__ == "__main__":
+    optimizer = AdaMax(
+        func=shifted_ackley, lower_bound=-2.768, upper_bound=+2.768, dim=2
+    )
+    best_solution, best_fitness = optimizer.search()
+    print(f"Best solution: {best_solution}")
+    print(f"Best fitness: {best_fitness}")