Zipper (data structure)

A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages.

[1] It includes and generalizes the gap buffer technique sometimes used with arrays.

The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures.

A layperson's explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent (often cd ..), and the possibility to go downwards (cd subdirectory).

Behind the scenes the zippers are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure).

Many common data structures in computer science can be expressed as the structure generated by a few primitive constructor operations or observer operations.

This recording together with the location is called a zipped representation of the list or a list-zipper.

To be clear, a location in the list is not just the number of Cons operations, but also all of the other information about those Cons; in this case, the values that must be reconnected.

Here, these values may be conveniently represented in a separate list in the order of application from the target location.

Specifically, from the context of "3" in the list [1, 2, 3, 4], a recording (commonly referred to as a 'path') could be represented as [2, 1] where Cons(2, L) is applied followed by (Cons 1, L) to reconstitute the original list starting from [3, 4].

A list-zipper always represents the entire data structure.

However, this information is from the perspective of a specific location within that data structure.

Consequently, a list-zipper is a pair consisting of both the location as a context or starting point, and a recording or path that permits reconstruction from that starting location.

The list may then be efficiently reconstructed: [1, 2, 10, 4] or other locations traversed to.

With the list represented this way, it is easy to define relatively efficient operations on immutable data structures such as Lists and Trees at arbitrary locations.

In particular, applying the zipper transform to a tree makes it easy to insert or remove values at any particular location in the tree.

The type of a zipper's contexts can be constructed via an operation on the original type that is closely related to the derivative of calculus through decategorification.

The recursive types that zippers are formed from can be viewed as the least fixed point of a unary type constructor of kind

that constructs the least fixed point of its argument, an unlabeled binary tree can be represented as

In this way, the derivative can be viewed as a zipper with a hole in it, where a value could be placed.

The two addends correspond to the two data constructors for a list.

, and the Cons constructor is represented as a product of the head element

Then the formal type of a zipper for a linked list is

indicates which branch (left or right) contains the hole, and the zipper contains a value for the root, a

The complete type of a zipper for this binary tree structure, then, is

The zipper is often used where there is some concept of focus or a cursor used to navigate around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner.

The zipper has been used in In a non-purely-functional programming language, it may be more convenient to simply traverse the original data structure and modify it in place (perhaps after deep cloning it, to avoid affecting other code that might hold a reference to it).

The generic zipper[6][7][8] is a technique to achieve the same goal as the conventional zipper by capturing the state of the traversal in a continuation while visiting each node.

(The Haskell code given in the reference uses generic programming to generate a traversal function for any data structure, but this is optional – any suitable traversal function can be used.)

However, the generic zipper involves inversion of control, so some uses of it require a state machine (or equivalent) to keep track of what to do next.