Louvain method

The inspiration for this method of community detection is the optimization of modularity as the algorithm progresses.

Modularity is a scale value between −1 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities with respect to edges outside communities.

Optimizing this value theoretically results in the best possible grouping of the nodes of a given network.

The Louvain algorithm was shown to correctly identify the community structure when it exists, in particular in the stochastic block model.

where As nodes in different communities do not contribute to the modularity Q, it can be written as:

The Louvain method begins by considering each node v in a graph to be its own community.

[5] This process is applied repeatedly and sequentially to all nodes until no modularity increase can occur.

This function does not show the edges being weighted, but a simple modification would allow for that information to be tracked.

From here, the process can be repeated so that more nodes are moved into existing communities until an optimal level of modularity is reached.

The pseudo-code below shows how the previous two functions work together to complete the process.

[5] Generally, the Louvain method is assumed to have a time complexity of

Richard Blondel, co-author of the paper that originally published the Louvain method, seems to support this notion,[6] but other sources claim the time complexity is "essentially linear in the number of links in the graph,"[7] meaning the time complexity would instead be

Unfortunately, no source has published an analysis of the Louvain method's time complexity so one is attempted here.

In the pseudo-code above, the function louvain controls the execution of the algorithm.

It's clear to see that inside of louvain, moveNodeswill be repeated until it is no longer possible to combine nodes into communities.

This depends on two factors: how much the modularity of the graph can improve and, in the worst case, if the modularity can improve with every iteration of louvain, it depends on how quickly aggregateGraph will reduce the graph down to a single node.

If, in each iteration of louvain, moveNodes is only able to move one node into a community, then aggregateGraph will only be able to reduce the size of the graph by one.

Since moveNodes iterates through all nodes in a graph, this would result in a time complexity of

Blondel et al. state in their original publication that most of the run time is spent in the early iterations of the algorithm because "the number of communities decreases drastically after just a few passes.

In this case, aggregateGraph would return a graph half the size of the original.

Additionally, there is no guarantee the size of the graph would be reduced by the same factor with each iteration, and so no single logarithm function can perfectly describe the time complexity.

For example, in social networks, most people belong to multiple communities: their family, their friends, their co-workers, old school buddies, etc.

In biological networks, most genes or proteins belong to more than one pathway or complex.

Furthermore, Louvain has been shown to sometimes produce arbitrarily badly connected communities, and has been effectively superseded (at least in the non-overlapping case) by the Leiden algorithm.

While this is the worst-case scenario, there are other, more subtle problems with the Louvain algorithm that can also lead to arbitrarily badly connected communities, such as the formation of communities using nodes that are only weakly connected.

This causes the smaller communities to be hidden; for an example of this, see the visual depiction of the resolution limit to the right.

Both the resolution limit of modularity and the arbitrarily badly connected community problem are further exasperated by each iteration of the algorithm.

Ultimately, the only thing the Louvain algorithm guarantees is that the resulting communities cannot be merged further; in other words, they're well-separated.

To avoid the problems that arise from arbitrarily badly connected communities and the resolution limit of modularity, it is recommended to use the Leiden algorithm instead, as its refinement phase and other various adjustments have corrected these issues.

The compared methods are, the algorithm of Clauset, Newman, and Moore,[2] Pons and Latapy,[11] and Wakita and Tsurumi.