Closeness centrality

Closeness was defined by Bavelas (1950) as the reciprocal of the farness,[1][2] that is: where

is the distance (length of the shortest path) between vertices

[3][4][5] When speaking of closeness centrality, people usually refer to its normalized form which represents the average length of the shortest paths instead of their sum.

For large graphs, the minus one in the normalisation becomes inconsequential and it is often dropped.

[9][10][11][12] The values produced by many centrality measures can be highly correlated.

[9][13][11] In particular, closeness and degree have been shown[12] to be related in many networks through an approximate relationship where

and β are parameters found by fitting closeness and degree to this formula.

The z parameter represents the branching factor, the average degree of nodes (excluding the root node and leaves) of the shortest-path trees used to approximate networks when demonstrating this relationship.

[12] This is never an exact relationship but it captures a trend seen in many real-world networks.

Closeness is related to other length scales used in network science.

For instance, the average shortest path length

, the average distance between vertices in a network, is simply the average of the inverse closeness values Taking distances from or to all other nodes is irrelevant in undirected graphs, whereas it can produce totally different results in directed graphs (e.g. a website can have a high closeness centrality from outgoing links, but low closeness centrality from incoming links).

In bibliometrics closeness has been used to look at the way academics choose their journals and bibliographies in different fields[14] or to measure the impact of an author on a field and their social capital.

[16] The closeness of a city in an air transport network has been shown to be highly correlated with socio-economic indicators such as gross regional domestic product.

[17] Closeness has also been applied to biological networks[5] where, for instance, this was used to identify more than 50% of the global regulators within the top 2% of the ranked genes[18] or essential genes were found to have higher closeness than nonessential genes in protein-interaction networks.

[19] In a metabolic network the closeness of nodes can identify the most important metabolites.

[20] When a graph is not strongly connected, Beauchamp introduced in 1965 the idea of using the sum of reciprocal of distances,[21] instead of the reciprocal of the sum of distances, with the convention

: Beauchamp's modification follows the (much later in time) general principle proposed by Marchiori and Latora (2000)[22] that in graphs with infinite distances the harmonic mean behaves better than the arithmetic mean.

This idea has resurfaced several time in the literature, often without the normalization factor

: for undirected graphs under the name valued centrality by Dekker (2005)[23] and under the name harmonic centrality by Rochat (2009);[24] if was axiomatized by Garg (2009)[25] and proposed once again later by Opsahl (2010).

[26] It was studied on general directed graphs by Boldi and Vigna (2014).

increases from 0 to 1, the generalized closeness changes from local characteristic (degree) to global (number of connected nodes).

The information centrality of Stephenson and Zelen (1989) is another closeness measure, which computes the harmonic mean of the resistance distances towards a vertex x, which is smaller if x has many paths of small resistance connecting it to other vertices.

[35] In the classic definition of the closeness centrality, the spread of information is modeled by the use of shortest paths.

This model might not be the most realistic for all types of communication scenarios.

Thus, related definitions have been discussed to measure closeness, like the random walk closeness centrality introduced by Noh and Rieger (2004).

It measures the speed with which randomly walking messages reach a vertex from elsewhere in the graph.