Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs),[1] are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions.
The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an implicit distribution defined by one or more variation operators, whereas EDAs use an explicit probability distribution encoded by a Bayesian network, a multivariate normal distribution, or another model class.
Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to LISP style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions.
The general procedure of an EDA is outlined in the following: Using explicit probabilistic models in optimization allowed EDAs to feasibly solve optimization problems that were notoriously difficult for most conventional evolutionary algorithms and traditional optimization techniques, such as problems with high levels of epistasis[citation needed].
Nonetheless, the advantage of EDAs is also that these algorithms provide an optimization practitioner with a series of probabilistic models that reveal a lot of information about the problem being solved.
This information can in turn be used to design problem-specific neighborhood operators for local search, to bias future runs of EDAs on a similar problem, or to create an efficient computational model of the problem.
For example, if the population is represented by bit strings of length 4, the EDA can represent the population of promising solution using a single vector of four probabilities (p1, p2, p3, p4) where each component of p defines the probability of that position being a 1.
Using this probability vector it is possible to create an arbitrary number of candidate solutions.
This section describes the models built by some well known EDAs of different levels of complexity.
is a parameter defining the learning rate, a small value determines that the previous model
The CGA,[7] also relies on the implicit populations defined by univariate distributions.
Although univariate models can be computed efficiently, in many cases they are not representative enough to provide better performance than GAs.
In order to overcome such a drawback, the use of bivariate factorizations was proposed in the EDA community, in which dependencies between pairs of variables could be modeled.
Bivariate and multivariate distributions are usually represented as probabilistic graphical models (graphs), in which edges denote statistical dependencies (or conditional probabilities) and vertices denote variables.
The MIMIC[8] factorizes the joint probability distribution in a chain-like model representing successive dependencies between variables.
New solutions are sampled from the leftmost to the rightmost variable, the first is generated independently and the others according to conditional probabilities.
In this case, the joint probability distribution is usually factorized in a number of components of limited size
The ECGA[10] was one of the first EDA to employ multivariate factorizations, in which high-order dependencies among decision variables can be modeled.
The ECGA popularized the term "linkage-learning" as denoting procedures that identify linkage sets.
The MC quantifies the model representation size in terms of number of bits required to store all the marginal probabilities
The CPC, on the other hand, quantifies the data compression in terms of entropy of the marginal distribution over all partitions, where
This procedure is repeated until no CCC improvements are possible and produces a linkage model
It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. Bayesian information criterion (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)).
[14] The scoring metric evaluates the network structure according to its accuracy in modeling the selected population.
From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents.
The linkage model is a linkage-tree produced stored as a Family of sets (FOS).
The linkage-tree learning procedure is a hierarchical clustering algorithm, which work as follows.
are merged, this procedure repeats until only one cluster remains, each subtree is stored as a subset
to guide an "optimal mixing" procedure which resembles a recombination operator but only accepts improving moves.
Similar ideas have been usually applied into local-search heuristics and, in this sense, the LTGA can be seen as an hybrid method.