Graph partition

[1] Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks.

For a survey on recent trends in computational methods and applications see Buluc et al.

Solutions to these problems are generally derived using heuristics and approximation algorithms.

[1] Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist,[4] unless P=NP.

Such partition problems have been discussed in literature as bicriteria-approximation or resource augmentation approaches.

However, the same result also implies that every planar graph of bounded degree has a balanced cut with O(√n) edges.

Since graph partitioning is a hard problem, practical solutions are based on heuristics.

Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality.

The most common example is spectral partitioning, where a partition is derived from approximate eigenvectors of the adjacency matrix, or spectral clustering that groups graph vertices using the eigendecomposition of the graph Laplacian matrix.

[6] A wide variety of partitioning and refinement methods can be applied within the overall multi-level scheme.

In many cases, this approach can give both fast execution times and very high quality results.

[8] An alternative approach originated from [9] and implemented, e.g., in scikit-learn is spectral clustering with the partitioning determined from eigenvectors of the graph Laplacian matrix for the original graph computed by LOBPCG solver with multigrid preconditioning.

Division into a larger number of communities can be achieved by repeated bisection or by using multiple eigenvectors corresponding to the smallest eigenvalues.

Additionally, cut size may be the wrong thing to minimize since a good division is not just one with small number of edges between communities.

Another objective function used for graph partitioning is Conductance which is the ratio between the number of cut edges and the volume of the smallest part.

The Cheeger bound guarantees that spectral bisection provides partitions with nearly optimal conductance.

Graph partition can be useful for identifying the minimal set of nodes or links that should be immunized in order to stop epidemics.

[14] Spin models have been used for clustering of multivariate data wherein similarities are translated into coupling strengths.

[15] The properties of ground state spin configuration can be directly interpreted as communities.

Additionally, Kernel-PCA-based Spectral clustering takes a form of least squares Support Vector Machine framework, and hence it becomes possible to project the data entries to a kernel induced feature space that has maximal variance, thus implying a high separation between the projected communities.

[16] Some methods express graph partitioning as a multi-criteria optimization problem which can be solved using local methods expressed in a game theoretic framework where each node makes a decision on the partition it chooses.

[17] For very large-scale distributed graphs classical partition methods might not apply (e.g., spectral partitioning, Metis[7]) since they require full access to graph data in order to perform global operations.

scikit-learn implements spectral clustering with the partitioning determined from eigenvectors of the graph Laplacian matrix for the original graph computed by ARPACK, or by LOBPCG solver with multigrid preconditioning.

It instantiates the multilevel approach in its most extreme version, removing only a single vertex in every level of the hierarchy.

By using this very fine grained n-level approach combined with strong local search heuristics, it computes solutions of very high quality.

It uses recursive multilevel bisection and includes sequential as well as parallel partitioning techniques.

Jostle[22] is a sequential and parallel graph partitioning solver developed by Chris Walshaw.

The software packages DibaP[24] and its MPI-parallel variant PDibaP[25] by Meyerhenke implement the Bubble framework using diffusion; DibaP also uses AMG-based techniques for coarsening and solving linear systems arising in the diffusive approach.

Sanders and Schulz released a graph partitioning package KaHIP[26] (Karlsruhe High Quality Partitioning) that implements for example flow-based methods, more-localized local searches and several parallel and sequential meta-heuristics.

The tools Parkway[27] by Trifunovic and Knottenbelt as well as Zoltan[28] by Devine et al. focus on hypergraph partitioning.