Double Cut and Join Model

The double cut and join (DCJ) model is a model for genome rearrangement used to define an edit distance between genomes based on gene order and orientation, rather than nucleotide sequence.

It takes the fundamental units of a genome to be synteny blocks, maximal sections of DNA conserved between genomes.

It focuses on changes due to genome rearrangement operations such as inversions, translocations as well as the creation and absorption of circular intermediates.

[1] [2] A genome is described as a directed, edge labeled graph with each vertex having degree 1 or 2.

Edges are labeled as synteny blocks, vertices of degree 1 represent telomeres, and vertices of degree 2 representing adjacencies between blocks.

This requires that the genome consist of cycles and paths.

Each component is called a chromosome.

The beginning of each edge is called the tail, the end of each edge is called the head; together heads and tails are known as extremities.

Vertices are described by their roles as heads and tails of blocks, for instance, in the figure, the adjacency which forms the head of marker 1 and the tail of marker 2 is labelled (h1, t2), the telomere formed by the head of 2 is (h2).

A double cut and join (DCJ) operation consists of one of the following four transformations: An edit distance, the double cut and join distance, is defined between genomes with the same number of edges

as the minimum number of DCJ operations needed to transform

The DCJ distance defines a metric space.

are genomes with the same edges will follow.)

holds because a series of DCJ operations transforming

followed by a series of transformations from

, so the minimal number of operations needed to transform

To compute the DCJ distance between two genomes

with the same set of synteny blocks, we construct a bipartite multigraph known as the adjacency graph

The vertex set of the adjacency graph is

are an extremity of the same synteny block.

share two extremities, we add two edges between

, we see that the adjacency graph is composed entirely of paths of length 1, connecting two telomeres, and cycles of length 2, connecting two adjacencies.

be the number of synteny blocks in genomes

be the number of cycles in their adjacency graph, and

be the number of paths in their adjacency graph.

The proof shows that each DCJ operation can decrease

, there exists an operation decreasing

is always defined, and gives a method for its calculation.

Since it is easy to count cycles and paths,

can be found in linear time.

A, B: Two genomes with four synteny blocks. C: The adjacency graph of the genomes pictured in A and B.