Cartesian tree

It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

Cartesian trees were introduced by Vuillemin (1980) in the context of geometric range searching data structures.

They have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems, in comparison sort algorithms that perform efficiently on nearly-sorted inputs, and as the basis for pattern matching algorithms.

If a sequence of numbers contains repetitions, a Cartesian tree can be determined for it by following a consistent tie-breaking rule before applying the above construction.

[2] Cartesian trees were introduced and named by Vuillemin (1980), who used them as an example of the interaction between geometric combinatorics and the design and analysis of data structures.

In particular, Vuillemin used these structures to analyze the average-case complexity of concatenation and splitting operations on binary search trees.

The name is derived from the Cartesian coordinate system for the plane: in one version of this structure, as in the two-dimensional range searching application discussed below, a Cartesian tree for a point set has the sorted order of the points by their

Vuillemin described both this geometric version of the structure, and the definition here in which a Cartesian tree is defined from a sequence.

Using sequences instead of point coordinates provides a more general setting that allows the Cartesian tree to be applied to non-geometric problems as well.

in the sequence and follow the path from this node to the root of the tree until finding a value

[3] An alternative linear-time construction algorithm is based on the all nearest smaller values problem.

The sequence of left neighbors can be found by an algorithm that maintains a stack containing a subsequence of the input.

[5] Yet another linear-time algorithm, using a linked list representation of the input sequence, is based on locally maximum linking: the algorithm repeatedly identifies a local maximum element, i.e., one that is larger than both its neighbors (or than its only neighbor, in case it is the first or last element in the list).

This process can be implemented in a single left-to-right pass of the input, and it is easy to see that each element can gain at most one left-, and at most one right child, and that the resulting binary tree is a Cartesian tree of the input sequence.

[8] Cartesian trees form part of an efficient data structure for range minimum queries.

Because lowest common ancestors can be found in constant time per query, using a data structure that takes linear space to store and can be constructed in linear time, the same bounds hold for the range minimization problem.

A three-sided range query, in which the task is to list all points within a region bounded by the three inequalities

, and (if the point lies within the three-sided region) continuing recursively in the two slabs bounded between

[3] The same construction, of lowest common ancestors in a Cartesian tree, makes it possible to construct a data structure with linear space that allows the distances between pairs of points in any ultrametric space to be queried in constant time per query.

The distance within an ultrametric is the same as the minimax path weight in the minimum spanning tree of the metric.

[13] The Cartesian tree of a sorted sequence is just a path graph, rooted at its leftmost endpoint.

[14] This idea was applied by Seidel & Aragon (1996), who suggested the use of random numbers as priorities.

, and the whole tree has logarithmic depth (its maximum root-to-leaf distance) with high probability.

It is also possible, as suggested by Aragon and Seidel, to reprioritize frequently-accessed nodes, causing them to move towards the root of the treap and speeding up future accesses for the same keys.

[14] Levcopoulos & Petersson (1989) describe a sorting algorithm based on Cartesian trees.

The Levcopoulos–Petersson algorithm can be viewed as a version of selection sort or heap sort that maintains a priority queue of candidate minima, and that at each step finds and removes the minimum value in this queue, moving this value to the end of an output sequence.

In their algorithm, the priority queue consists only of elements whose parent in the Cartesian tree has already been found and removed.

Thus, the algorithm consists of the following steps:[16] As Levcopoulos and Petersson show, for input sequences that are already nearly sorted, the size of the priority queue will remain small, allowing this method to take advantage of the nearly-sorted input and run more quickly.

[16] The problem of Cartesian tree matching has been defined as a generalized form of string matching in which one seeks a substring (or in some cases, a subsequence) of a given string that has a Cartesian tree of the same form as a given pattern.

Fast algorithms for variations of the problem with a single pattern or multiple patterns have been developed, as well as data structures analogous to the suffix tree and other text indexing structures.

A sequence of numbers and the Cartesian tree derived from them.
Two-dimensional range-searching using a Cartesian tree: the bottom point (red in the figure) within a three-sided region with two vertical sides and one horizontal side (if the region is nonempty) can be found as the nearest common ancestor of the leftmost and rightmost points (the blue points in the figure) within the slab defined by the vertical region boundaries. The remaining points in the three-sided region can be found by splitting it by a vertical line through the bottom point and recursing.
Pairs of consecutive sequence values (shown as the thick red edges) that bracket a sequence value (the darker blue point). The cost of including this value in the sorted order produced by the Levcopoulos–Petersson algorithm is proportional to the logarithm of this number of bracketing pairs.