Maximal entropy random walk

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy.

While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally (average entropy production) by assuming uniform probability distribution among all paths in a given graph.

A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to Fibonacci coding.

Its properties also made it useful for example in analysis of complex networks,[1] like link prediction,[2] community detection,[3] robust transport over networks[4] and centrality measures.

[9] Additionally, it recreates some properties of quantum mechanics, suggesting a way to repair the discrepancy between diffusion models and quantum predictions, like Anderson localization.

For simplicity assume it is an undirected graph, which corresponds to a symmetric

; however, MERW can also be generalized for directed and weighted graphs (for example Boltzmann distribution among paths instead of uniform).

We would like to choose a random walk as a Markov process on this graph: for every vertex

(containing the transition probabilities of a Markov chain) such that Assuming this graph is connected and not periodic, ergodic theory says that evolution of this stochastic process leads to some stationary probability distribution

In the standard random walk, referred to here as generic random walk (GRW), we naturally choose that each outgoing edge is equally probable: For a symmetric

, or equivalently assumes uniform probability distribution among all paths in a given graph.

Then stochastic matrix and stationary probability distribution are given by for which every possible path of length

is In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph (are nonlocal).

Hence, they should not be imagined as directly applied by the walker – if random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate.

MERW is based on the principle of maximum entropy, making it the safest assumption when we don't have any additional knowledge about the system.

For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamics – not necessarily random, like a particle.

Assume for simplicity that the considered graph is indirected, connected and aperiodic, allowing to conclude from the Perron–Frobenius theorem that the dominant eigenvector is unique.

MERW requires uniform distribution along paths.

, we get: As in Boltzmann distribution of paths for energy defined as sum of

For example, it allows to calculate probability distribution of patterns in Ising model.

Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0.

To maximize the amount of information transmitted in such sequence, we should assume uniform probability distribution in the space of all possible sequences fulfilling this constraint.

To practically use such long sequences, after 1 we have to use 0, but there remains a freedom of choosing the probability of 0 after 0.

, then entropy coding would allow encoding a message using this chosen probability distribution.

In contrast, standard random walk would choose suboptimal

A more complex example is the defected one-dimensional cyclic lattice: let say 1000 nodes connected in a ring, for which all nodes but the defects have a self-loop (edge to itself).

For MERW we have to first find the dominant eigenvector of the adjacency matrix – maximizing

The formula inside the bracket is discrete Laplace operator, making this equation a discrete analogue of stationary Schrodinger equation.

with its strongly localized density (in contrast to standard diffusion).

Taking the infinitesimal limit, we can get standard continuous stationary (time-independent) Schrodinger equation (

Left: basic concept of the generic random walk (GRW) and maximal entropy random walk (MERW)
Right: example of their evolution on the same inhomogeneous 2D lattice with cyclic boundary conditions – probability density after 10, 100 and 1000 steps while starting from the same vertex. The small boxes represent defects: all vertices but the marked ones have additional self-loop (edge to itself). For regular lattices (no defects), GRW and MERW are identical. While defects do not strongly affect the local beha­vior, they lead to a completely different global stationary probability here. While GRW (and based on it standard diffusion ) leads to nearly uniform stationary density, MERW has strong localization property, imprisoning the walkers in entropic wells in analogy to electrons in defected lattice of semi-conductor .
Left: choosing the optimal probability after symbol 0 in Fibonacci coding . Right: one-dimensional defected lattice and its stationary density for length 1000 cycle (it has three defects). While in standard random walk the stationary density is proportional to degree of a vertex, leading to 3/2 difference here, in MERW density is nearly completely localized in the largest defect-free region, analogous to the ground state predicted by quantum mechanics .