Collation

Collation is a fundamental element of most office filing systems, library catalogs, and reference books.

Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order).

A similar approach may be taken with strings representing dates or other items that can be ordered chronologically or in some other natural fashion.

To decide which of two strings comes first in alphabetical order, initially their first letters are compared.

(If one string runs out of letters to compare, then it is deemed to come first; for example, "cart" comes before "carthorse".)

[1] For example, the words kitāba (كتابة 'writing'), kitāb (كتاب 'book'), kātib (كاتب 'writer'), maktaba (مكتبة 'library'), maktab (مكتب 'office'), maktūb (مكتوب 'fate,' or 'written'), are agglomerated under the triliteral root k-t-b (ك ت ب), which denotes 'writing'.

[2] Another form of collation is radical-and-stroke sorting, used for non-alphabetic writing systems such as the hanzi of Chinese and the kanji of Japanese, whose thousands of symbols defy ordering by convention.

In Greater China, surname stroke ordering is a convention in some official documents where people's names are listed without hierarchy.

When information is stored in digital systems, collation may become an automated process.

It is then necessary to implement an appropriate collation algorithm that allows the information to be sorted in a satisfactory manner for the application in question.

Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections.

So a computer program might treat the characters a, b, C, d, and $ as being ordered $, C, a, b, d (the corresponding ASCII codes are $ = 36, a = 97, b = 98, C = 67, and d = 100).

It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons[note 1]) before comparison of ASCII values.

For example, pages, sections, chapters, and the like, as well as the items of lists, are frequently "numbered" in this way.

For example, the Russian letters Ъ and Ь (which in writing are only used for modifying the preceding consonant), and usually also Ы, Й, and Ё, are omitted.