Exponential family random graph models

Exponential Random Graph Models (ERGMs) are a family of statistical models for analyzing data from social and other networks.

Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity.

[9] This set of alternative networks may have similar or dissimilar structural features.

To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network.

However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression.

[10][2] Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties.

[11] Degree-preserving randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.

An ERGM is a model from this family which describes networks.

The basic assumption of these models is that the structure in an observed graph

which are a function of the observed network and, in some cases, nodal attributes.

This way, it is possible to describe any kind of dependence between the undyadic variables:

These models represent a probability distribution on each possible network on

Because the number of possible networks in the set vastly outnumbers the number of parameters which can constrain the model, the ideal probability distribution is the one which maximizes the Gibbs entropy.

Intuitively, the structure of graph probabilities in this ERGM example are consistent with typical patterns of social or other networks.

This is consistent with the sparsity that is often found in empirical networks, namely that the empirical number of edges typically grows at a slower rate than the maximally possible number of edges.

This is consistent with a tendency for triadic closure that is often found in certain types of social networks.

Compare these patterns with the graph probabilities computed above.

Exact sampling from a given ERGM is computationally intractable in general since computing the normalizing constant requires summation over all

Efficient approximate sampling from an ERGM can be done via Markov chains and is applied in current methods to approximate expected values and to estimate ERGM parameters.

(which might be arbitrarily, or randomly, chosen or might represent an observed network) and implicitly defines transition probabilities (or jump probabilities)

, which are the conditional probabilities that the Markov chain is on graph

The transition probabilities do not depend on the graphs in earlier steps (

), which is a defining property of Markov chains, and they do not depend on

If this is achieved, one can run the Markov chain for a large number of steps and then returns the current graph as a random sample from the given ERGM.

after a finite but large number of update steps is approximately the probability defined by the ERGM.

Current methods for sampling from ERGMs with Markov chains[13] usually define an update step by two sub-steps: first, randomly select a candidate

(If the candidate is not accepted, the Markov chain remains on the current graph

is unconstrained (i.e., contains any combination of values on the binary tie variables), a simple method for candidate selection is to choose one tie variable

uniformly at random and to define the candidate by flipping this single variable (i.e., to set

cancels out in this fraction, so that the acceptance probabilities can be computed efficiently.