An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as
Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value
In contrast to most classical methods, fewer assumptions on the underlying objective function are made.
Because only a ranking (or, equivalently, sorting) of candidate solutions is exploited, neither derivatives nor even an (explicit) objective function is required by the method.
Two main principles for the adaptation of parameters of the search distribution are exploited in the CMA-ES algorithm.
First, a maximum-likelihood principle, based on the idea to increase the probability of successful candidate solutions and search steps.
The mean of the distribution is updated such that the likelihood of previously successful candidate solutions is maximized.
The covariance matrix of the distribution is updated (incrementally) such that the likelihood of previously successful search steps is increased.
One path is used for the covariance matrix adaptation procedure in place of single successful search steps and facilitates a possibly much faster variance increase of favorable directions.
This step-size control aims to make consecutive movements of the distribution mean orthogonal in expectation.
The second line suggests the interpretation as unbiased perturbation (mutation) of the current favorite solution vector
The only feedback used from the objective function here and in the following is an ordering of the sampled candidate solutions due to the indices
is updated using cumulative step-size adaptation (CSA), sometimes also denoted as path length control.
is predetermined by the number of available processors), the above introduced parameters are not specific to the given objective function and therefore not meant to be modified by the user.
[3] The update equations for mean and covariance matrix maximize a likelihood while resembling an expectation–maximization algorithm.
, i.e. without step-size control and rank-one update, CMA-ES can thus be viewed as an instantiation of Natural Evolution Strategies (NES).
Combining the previous equalities we get A Monte Carlo approximation of the latter expectation takes the average over λ samples from p where the notation
is fixed and for and, after some calculations, the updates in the CMA-ES turn out as[4] and where mat forms the proper matrix from the respective natural gradient sub-vector.
It is comparatively easy to see that the update equations of CMA-ES satisfy some stationarity conditions, in that they are essentially unbiased.
, we find that and under some mild additional assumptions on the initial conditions and with an additional minor correction in the covariance matrix update for the case where the indicator function evaluates to zero, we find Invariance properties imply uniform performance on a class of objective functions.
They have been argued to be an advantage, because they allow to generalize and predict the behavior of the algorithm and therefore strengthen the meaning of empirical results obtained on single functions.
Conceptual considerations like the scale-invariance property of the algorithm, the analysis of simpler evolution strategies, and overwhelming empirical evidence suggest that the algorithm converges on a large class of functions fast to the global optimum, denoted as
, and more rigorously or similarly, This means that on average the distance to the optimum decreases in each iteration by a "constant" factor, namely by
Using a non-identity covariance matrix for the multivariate normal distribution in evolution strategies is equivalent to a coordinate system transformation of the solution vectors,[8] mainly because the sampling equation can be equivalently expressed in an "encoded space" as The covariance matrix defines a bijective transformation (encoding) for all solution vectors into a space, where the sampling takes place with identity covariance matrix.
[8] This adaptive encoding procedure is not confined to algorithms that sample from a multivariate normal distribution (like evolution strategies), but can in principle be applied to any iterative search method.
Optionally, the number of candidate samples λ (population size) can be modified by the user in order to change the characteristic search behavior (see above) and termination conditions can or should be adjusted to the problem at hand.
The CMA-ES has been empirically successful in hundreds of applications and is considered to be useful in particular on non-convex, non-separable, ill-conditioned, multi-modal or noisy objective functions.
Assuming a black-box optimization scenario, where gradients are not available (or not useful) and function evaluations are the only considered cost of search, the CMA-ES method is likely to be outperformed by other methods in the following conditions: On separable functions, the performance disadvantage is likely to be most significant in that CMA-ES might not be able to find at all comparable solutions.
Some Natural Evolution Strategies are close variants of the CMA-ES with specific parameter settings.
[12] Another remarkable extension has been the addition of a negative update of the covariance matrix with the so-called active CMA.