Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function.
For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.
The name of the algorithm comes from annealing in metallurgy, a technique involving heating and controlled cooling of a material to alter its physical properties.
The problems solved by SA are currently formulated by an objective function of many variables, subject to several mathematical constraints.
Similar techniques have been independently introduced on several occasions, including Pincus (1970),[2] Khachaturyan et al (1979,[3] 1981[4]), Kirkpatrick, Gelatt and Vecchi (1983), and Cerny (1985).
The simulation can be performed either by a solution of kinetic equations for probability density functions,[7][8] or by using a stochastic sampling method.
Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.
, the system will then increasingly favor moves that go "downhill" (i.e., to lower energy values), and avoid those that go "uphill."
set to a high value (or infinity), and then it is decreased at each step following some annealing schedule—which may be specified by the user but must end with
In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower, and finally move downhill according to the steepest descent heuristic.
In order to apply the simulated annealing method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function E(), the candidate generator procedure neighbor (), the acceptance probability function P(), and the annealing schedule temperature() AND initial temperature init_temp.
Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves.
To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the transition probabilities that result from the various design choices made in the implementation of the algorithm.
This formula was superficially justified by analogy with the transitions of a physical system; it corresponds to the Metropolis–Hastings algorithm, in the case where T=1 and the proposal distribution of Metropolis–Hastings is symmetric.
However, this acceptance probability is often used for simulated annealing even when the neighbor () function, which is analogous to the proposal distribution in Metropolis–Hastings, is not symmetric, or not probabilistic at all.
need not bear any resemblance to the thermodynamic equilibrium distribution over states of that physical system, at any temperature.
Nevertheless, most descriptions of simulated annealing assume the original acceptance function, which is probably hard-coded in many implementations of SA.
In 1990, Moscato and Fontanari,[13] and independently Dueck and Scheuer,[14] proposed that a deterministic update (i.e. one that is not based on the probabilistic acceptance rule) could speed-up the optimization process without impacting on the final quality.
Moscato and Fontanari conclude from observing the analogous of the "specific heat" curve of the "threshold updating" annealing originating from their study that "the stochasticity of the Metropolis updating in the simulated annealing algorithm does not play a major role in the search of near-optimal minima".
Instead, they proposed that "the smoothening of the cost function landscape at high temperature and the gradual definition of the minima during the cooling process are the fundamental ingredients for the success of simulated annealing."
In 2001, Franz, Hoffmann and Salamon showed that the deterministic update strategy is indeed the optimal one within the large class of algorithms that simulate a random walk on the cost/energy landscape.
Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state
Thus, the consecutive-swap neighbor generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with
Thus, in the traveling salesman example above, one could use a neighbor () function that swaps two random cities, where the probability of choosing a city-pair vanishes as their distance increases beyond
On the other hand, one can often vastly improve the efficiency of simulated annealing by relatively simple changes to the generator.
The physical analogy that is used to justify simulated annealing assumes that the cooling rate is low enough for the probability distribution of the current state to be near thermodynamic equilibrium at all times.
Unfortunately, the relaxation time—the time one must wait for the equilibrium to be restored after a change in temperature—strongly depends on the "topography" of the energy function and on the current temperature.
In the simulated annealing algorithm, the relaxation time also depends on the candidate generator, in a very complicated way.
Note that all these parameters are usually provided as black box functions to the simulated annealing algorithm.
Adaptive simulated annealing algorithms address this problem by connecting the cooling schedule to the search progress.