Creating a big ticket using the list from/for the paper, closing smaller issues.
Ultimately, this should be closed in favour of much more specific tickets ("implement method x")
Optimisers
Semi-organised list of optimisation methods.
Italicised bits are my attempt at a one-line summary
Reference should be given if possible.
Things of interest:
Constrained or unconstrained (or (un)constrained with caveats)
Sometimes either one is required
Derivatives used:
Direct search: Sample f(x), but don't try to approximate or use a gradient
Smoothness requirements
Should f be differentiable? Twice-differentiable?
Heuristic vs comes-with-proof-of-convergence
Note though: Convergence doesn't mean the answer is the global best
So maybe just ignore this?
Derivative-free Direct methods (aka Search methods)
Completely ignore the idea of f having a derivative
Brute-force exploration
Look at the landscape and pick the lowest point
Constrained
Uniform sampling
Latin hypercube sampling
Should we generalise this as sampling-based optimisation?
Random search
Coordinate search (aka Coordinate descent, aka Compass search)
Evaluate f at nearby points at distance d in each direction, if better go there and end iteration, if no better point found decrease d
See book: "Introduction to derivative-free optimization"
Type of "hill climbing" algorithm
Adaptive coordinate descent
Perform CS, but adapt (transform) the coordinate vectors to obtain maximum decorrelation along coordinates
Stochastic coordinate descent (aka stochastic hill climbing)
Sometimes go the wrong way
Search & poll / Pattern search
Search: Evaluate f at a finite number of points, jump to lower if found
Poll: If no lower f found, perform local search, with some iteratively decreasing distance parameter
Fancy versions can use e.g. surrogate models
Should converge to a mode, provided f is smooth
Rosenbrock's method (1960) https://doi.org/10.1093/comjnl/3.3.175
Audett & Dennis: Pattern search filter methods (2004)
Generalized pattern search (GPS)
Generating set search (GSS)
Mesh adaptive search (MADS)
Simplex methods
Simplex = extension of line segment / triangle / tetrahedron to any dimension
Nelder-Mead (aka downhill simplex)
Maintain a simplex, at each iteration replace its worst point by a value determined by the values of the other points
Many extensions to avoid getting stuck
Unconstrained
Subplex
Nelder-Mead on "a sequence of subspaces"
Tabu search
Perform local search (e.g. random or coordinate search), but allow moves to worse f if at an optimum; but remember previously visited points and disallow those (mark them taboo).
Dividing rectangles algorithm (DIRECT)
Partitions the (bounded) subspace into rectangles, and works out where the optimum could be for any value of the Livschitz constant (which says how fast a function varies with its input), then subdivides all rectangles where it could possible be
Powell's conjugate direction method
Brent's method (aka Brent-Dekker method)
Golden-section line search
Define a range around the optimum and then disect it using golden ratio cuts
Multi-start methods
Start from several points to avoid local optima
Multi-level single-linkage (MLSL)
Multi-start from uniform places densely in the search space should find all optima, but is expensive, so use clustering methods to try and identify clusters of starting points with the same attractor
Basin-hopping
Repeatedly perform and then perturb a local search
Derivative-free evolutionary methods and metaheuristics
Maintain some population of points that gets updated at each iteration
Genetic algorithms (GA)
Generate new points using crossover and mutation, estimate the fitness of all points in the population, remove the points with the worst scores
Note this depends on a ranking of fitness, so it may be possible to avoid evaluating f(x) if something that gives the same rankings is available
Differential evolution
Loop over each point in a population, generate a new proposal for each point (as in GA), and replace point with that proposal if its f is lower
Differential evolution
Self-adaptive DE (jDE, iDE, and pDE)
Evolution strategies
Like GA, but mutate by adding a normally distributed random value to each particle - no crossover
Can use ranking instead of objective function (see GA)
Stochastic ranking for constrained evolutionary optimisation
Improved stochastic ranking evolution strategy (ISRES)
(N+1)-ES Simple Evolutionary Algorithm
Controlled random search (CRS)
Evolve a random population using a Heuristic that's a bit like a randomised Nelder-Mead iteration
Swarm algorithms
Particle-swarm optimisation
Several particles buzz around the search space randomly, but with a bias towards the previous personal and global best
Unconstrained / Constrained-with-caveats
Particle-based method
Evolutionary / biologically inspired method
No proof of convergence
Global method
Ant colony optimization (see wikipedia)
Artificial bee colony algorithm (see wikipedia)
Artifical swarm intelligence
Bees algorihtm (see wikipedia)
Cuckoo search (see wikipedia)
Glowworm swarm optimization (see wikipedia)
Particle swarm optimization generational (GPSO)
Metaheuristics
Adaptive dimensional search (see wikipedia)
Bat algorithm (see wikipedia)
Colliding bodies optimisation (see wikipedia)
Cuttlefish optimization (see wikipedia)
Duelist algorithm (see wikipedia)
Flower polination algorithm (see wikipedia)
Firefly algorithm (see wikipedia)
Gaussian adaptation (see wikipedia)
Apparently "based on information theory"
Gravitational search algorithm (see wikipedia)
Harmony search (see wikipedia)
Hunting search (see wikipedia)
Hydrological cycle algorithm (see wikipedia)
Imperialist competitive algorithm (see wikipedia)
Intelligent water drops algorithm (see wikipedia)
Killer whale algorithm (see wikipedia)
Mass and energy balances algorithm (see wikipedia)
Memetic algorithm (see wikipedia)
Rain water algorithm (see wikipedia)
River formation dyanmics (see wikipedia)
Spiral optiomization (see wikipedia)
Bayesian optimistation
Treat f as an unknown distribution, define a prior over it, generate samples from f, combine with prior to form posterior, repeat.
Gradient-estimating methods
Assume f' exists, and then approximate it
Finite difference methods
Approximate f' with finite differences, and find its root
Quasi-newton methods
"Iteratively guess the root is where the tangent hits the x-axis"
Unconstrained
Simplex gradient methods / Implicit filtering
"Line-search using the simplex gradient"
Natural evolution strategies (NES)
Use a search population to estimate the "natural gradient", a gradient which takes different scaling of the coordinates into account
Covariance matrix adaptation evolution strategy (CMA-ES)
Exponential natural evolution strategy (xNES)
Separable natural evolution strategy (SNES)
Surrogate-model methods
Replace f by some function g for which g' is easy to find
Trust-region methods
Select a region of search space, approximate f (typically with a quadratic function), and move to its minimum
Powell's Constrained Optimisation By Linear approximations COBYLA (constrained), linear function
Powell's UOBYQA (unconstrained), quadratic function
Powell's NEWUOA (unconstrained), quadratic function
Powell's BOBYQA (constrained), quadratic function
Powell's LINCOA (constrained), quadratic function
Trust region reflective method (TRR)
Dog-leg trust-region algorithm
Data-based Online Nonlinear Extremumseeker (DONE)
Use Fourier-based model, then optimize on that with L-BFGS
Methods requiring the 1st-order gradient
Root finding methods
Find a point where f'(x) = 0 (and hope there's only one)
Newton's method
_Iteratively guess that the root is where the tangent hits the x-axis
Unconstrained
Truncated Newton methods
Levenberg-Marquardt algorithm
Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Attempts to approximate hessian, works best if function is twice-differentiable
Unconstrained
BFGS-B is constrained
Limited-memory BFGS (L-BFGS or LM-BFGS)
Like BFGS but with linear memory requirement, so good for high numbers of variables
Gradient descent (aka Steepest descent)
Unconstrained / Constrained-with-projection
Basic gradient descent
Walk down the steepest direction
Step size is fraction of gradient: sensitive parameter!
Conjugate gradient descent
Stochastic gradient descent (SGD)
Approximate or partially evaluate the gradient
For example, calculate the gradient in each direction separately, and perform a separate step in each direction
Implicit updates (ISGD)
Use gradient information after the step
Adaptive subgradient methods (AdaGrad) AdaGrad optimiser #1468
RMSProp or AdaDelta (very similar) AdaDelta optimiser #1470
Adaptive Moment Estimation (Adam)
Natural gradient stochastic descent
Continuation
Define a class of functions F(x,k), so that f'(x)=F(x,1), and some solution F(x*, 0) = 0 is known, then 'continue' k to find the x where f'(x)=0
Others
Methods requiring the 2nd-order gradient
?
Good resources for optimisers:
Creating a big ticket using the list from/for the paper, closing smaller issues.
Ultimately, this should be closed in favour of much more specific tickets ("implement method x")
Optimisers
Semi-organised list of optimisation methods.
Italicised bits are my attempt at a one-line summary
Reference should be given if possible.
Things of interest:
Derivative-free Direct methods (aka Search methods)
Completely ignore the idea of f having a derivative
Brute-force exploration
Look at the landscape and pick the lowest point
Random search
Random search (RS)
Random optimization
Simulated annealing
Stochastic tunneling (STUN)
Parallel tempering
Coordinate search (aka Coordinate descent, aka Compass search)
Evaluate f at nearby points at distance d in each direction, if better go there and end iteration, if no better point found decrease d
Search & poll / Pattern search
Simplex methods
Nelder-Mead (aka downhill simplex)Nelder-Mead on "a sequence of subspaces"
Tabu search
Perform local search (e.g. random or coordinate search), but allow moves to worse f if at an optimum; but remember previously visited points and disallow those (mark them taboo).
Dividing rectangles algorithm (DIRECT)
Partitions the (bounded) subspace into rectangles, and works out where the optimum could be for any value of the Livschitz constant (which says how fast a function varies with its input), then subdivides all rectangles where it could possible be
Powell's conjugate direction method
Multi-start methods
Start from several points to avoid local optima
Multi-level single-linkage (MLSL)
Multi-start from uniform places densely in the search space should find all optima, but is expensive, so use clustering methods to try and identify clusters of starting points with the same attractor
Basin-hopping
Repeatedly perform and then perturb a local search
Derivative-free evolutionary methods and metaheuristics
Maintain some population of points that gets updated at each iteration
Genetic algorithms (GA)
Generate new points using crossover and mutation, estimate the fitness of all points in the population, remove the points with the worst scores
Differential evolution
Loop over each point in a population, generate a new proposal for each point (as in GA), and replace point with that proposal if its f is lower
Evolution strategies
Like GA, but mutate by adding a normally distributed random value to each particle - no crossover
Can use ranking instead of objective function (see GA)
Stochastic ranking for constrained evolutionary optimisation
Improved stochastic ranking evolution strategy (ISRES)
(N+1)-ES Simple Evolutionary Algorithm
Controlled random search (CRS)
Evolve a random population using a Heuristic that's a bit like a randomised Nelder-Mead iteration
Swarm algorithms
Particle-swarm optimisationAnt colony optimization (see wikipedia)
Artificial bee colony algorithm (see wikipedia)
Artifical swarm intelligence
Bees algorihtm (see wikipedia)
Cuckoo search (see wikipedia)
Glowworm swarm optimization (see wikipedia)
Particle swarm optimization generational (GPSO)
Metaheuristics
Bayesian optimistation
Treat f as an unknown distribution, define a prior over it, generate samples from f, combine with prior to form posterior, repeat.
Gradient-estimating methods
Assume f' exists, and then approximate it
Finite difference methods
Approximate f' with finite differences, and find its root
Simplex gradient methods / Implicit filtering
"Line-search using the simplex gradient"
Natural evolution strategies (NES)
Use a search population to estimate the "natural gradient", a gradient which takes different scaling of the coordinates into account
Covariance matrix adaptation evolution strategy (CMA-ES)Exponential natural evolution strategy (xNES)Separable natural evolution strategy (SNES)Surrogate-model methods
Replace f by some function g for which g' is easy to find
Trust-region methods
Select a region of search space, approximate f (typically with a quadratic function), and move to its minimum
Data-based Online Nonlinear Extremumseeker (DONE)
Use Fourier-based model, then optimize on that with L-BFGS
Methods requiring the 1st-order gradient
Root finding methods
Find a point where f'(x) = 0 (and hope there's only one)
Gradient descent (aka Steepest descent)
Basic gradient descentAdaptive subgradient methods (AdaGrad)AdaGrad optimiser #1468RMSProp or AdaDelta (very similar)AdaDelta optimiser #1470Adaptive Moment Estimation (Adam)Continuation
Define a class of functions F(x,k), so that f'(x)=F(x,1), and some solution F(x*, 0) = 0 is known, then 'continue' k to find the x where f'(x)=0
Others
Conservative convex separable approximation (CCSA)
Shifted limited-memory variable metrics methods
Methods requiring the 2nd-order gradient
?
Good resources for optimisers: