In computer science and graph theory, the term color-coding refers to an algorithmic technique which is useful in the discovery of network motifs.
For example, it can be used to detect a simple path of length k in a given graph.
The traditional color-coding algorithm is probabilistic, but it can be derandomized without much overhead in the running time.
Color-coding also applies to the detection of cycles of a given length, and more generally it applies to the subgraph isomorphism problem (an NP-complete problem), where it yields polynomial time algorithms when the subgraph pattern that it is trying to detect has bounded treewidth.
The color-coding method was proposed and analyzed in 1994 by Noga Alon, Raphael Yuster, and Uri Zwick.
[1][2] The following results can be obtained through the method of color-coding: To solve the problem of finding a subgraph
, the method of color-coding begins by randomly coloring each vertex of G with
This method works by repeating (1) random coloring a graph and (2) finding colorful copy of the target subgraph, and eventually the target subgraph can be found if the process is repeated a sufficient number of times.
Then the expected time to find a copy of H in G, if one exists, is
An example would be finding a simple cycle of length k in graph G = (V, E).
By applying random coloring method, each simple cycle has a probability of
overall time to find a simple cycle of length k in G. The colorful cycle-finding algorithm works by first finding all pairs of vertices in V that are connected by a simple path of length k − 1, and then checking whether the two vertices in each pair are connected.
Note that V can be divided into V1 and V2 accordingly, and let G1 and G2 denote the subgraphs induced by V1 and V2 respectively.
Then, recursively find colorful paths of length
Suppose the boolean matrix A1 and A2 represent the connectivity of each pair of vertices in G1 and G2 by a colorful path, respectively, and let B be the matrix describing the adjacency relations between vertices of V1 and those of V2, the boolean product
gives all pairs of vertices in V that are connected by a colorful path of length k − 1.
Although this algorithm finds only the end points of the colorful path, another algorithm by Alon and Naor[4] that finds colorful paths themselves can be incorporated into it.
For the target subgraph H in G to be discoverable, the enumeration has to include at least one instance where the H is colorful.
To achieve this, enumerating a k-perfect family F of hash functions from {1, ..., |V|} to {1, ..., k} is sufficient.
There are several approaches to construct such a k-perfect hash family: In the case of derandomizing well-coloring, where each vertex on the subgraph is colored consecutively, a k-perfect family of hash functions from {1, ..., |V|} to {1, ..., k!}
A sufficient k-perfect family which maps from {1, ..., |V|} to {1, ..., kk} can be constructed in a way similar to the approach 3 above (the first step).
In particular, it is done by using nklog k random bits that are almost klog k independent, and the size of the resulting k-perfect family will be
The derandomization of color-coding method can be easily parallelized, yielding efficient NC algorithms.
Recently, color-coding has attracted much attention in the field of bioinformatics.
One example is the detection of signaling pathways in protein-protein interaction (PPI) networks.
Another example is to discover and to count the number of motifs in PPI networks.
Studying both signaling pathways and motifs allows a deeper understanding of the similarities and differences of many biological functions, processes, and structures among organisms.
Due to the huge amount of gene data that can be collected, searching for pathways or motifs can be highly time consuming.
However, by exploiting the color-coding method, the motifs or signaling pathways with
Thus, this enables us to explore more complex or larger structures in PPI networks.