A graph data structure consists of a finite (and possibly mutable) set of vertices (also called nodes or points), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph.
The vertices may be part of the graph structure, or may be external entities represented by integer indices or references.
A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).
The basic operations provided by a graph data structure G usually include:[1] Structures that associate values to the edges usually also provide:[1] The following table gives the time complexity cost of performing various operations on graphs, for each of these representations, with |V| the number of vertices and |E| the number of edges.
[citation needed] In the matrix representations, the entries encode the cost of following an edge.
Adjacency lists are generally preferred for the representation of sparse graphs, while an adjacency matrix is preferred if the graph is dense; that is, the number of edges
[5][6] The time complexity of operations in the adjacency list representation can be improved by storing the sets of adjacent vertices in more efficient data structures, such as hash tables or balanced binary search trees (the latter representation requires that vertices are identified by elements of a linearly ordered set, such as integers or character strings).
A representation of adjacent vertices via hash tables leads to an amortized average time complexity of
to test adjacency of two given vertices and to remove an edge and an amortized average time complexity[7] of
The time complexity of the other operations and the asymptotic space requirement do not change.
The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio.
[8][9] The graph representation used for parallel architectures plays a significant role in facing those challenges.
Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its scalability.
In the case of a shared memory model, the graph representations used for parallel processing are the same as in the sequential case,[10] since parallel read-only access to the graph representation (e.g. an adjacency list) is efficient in shared memory.
In the distributed memory model, the usual approach is to partition the vertex set
The vertex set partitions are then distributed to the PEs with matching index, additionally to the corresponding edges.
Every PE has its own subgraph representation, where edges with an endpoint in another partition require special attention.
For standard communication interfaces like MPI, the ID of the PE owning the other endpoint has to be identifiable.
During computation in a distributed graph algorithms, passing information along these edges implies communication.
This can be understood as a row-wise or column-wise decomposition of the adjacency matrix.
For algorithms operating on this representation, this requires an All-to-All communication step as well as
[12] Therefore, each processing unit can only have outgoing edges to PEs in the same row and column.
Graphs with trillions of edges occur in machine learning, social network analysis, and other areas.
Compressed graph representations have been developed to reduce I/O and memory requirements.