Bitmap index

The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?

Bitmap indexes have a significant space and performance advantage over other structures for query of such data.

Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications.

It is easy to see that each bit in bitmap Y shows whether a particular row refers to a person who has internet access.

[2] For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in open source software such as FastBit,[3] the Lemur Bitmap Index C++ Library,[4] the Roaring Bitmap Java library[5] and the Apache Hive Data Warehouse system.

More importantly, bitmaps compressed with BBC, WAH, COMPAX, PLWAH, EWAH and CONCISE can directly participate in bitwise operations without decompression.

[15] The performance of schemes such as BBC, WAH, PLWAH, EWAH, COMPAX and CONCISE is dependent on the order of the rows.

Reshuffling techniques have also been proposed to achieve the same results of sorting when indexing streaming data.

Without considering compression, Chan and Ioannidis analyzed a class of multi-component encoding methods and came to the conclusion that two-component encoding sits at the kink of the performance vs. index size curve and therefore represents the best trade-off between index size and query performance.

The concept of bitmap index was first introduced by Professor Israel Spiegler and Rafi Maayan in their research "Storage and Retrieval Considerations of Binary Data Bases", published in 1985.

[24] The first commercial database product to implement a bitmap index was Computer Corporation of America's Model 204.

[25] This implementation is a hybrid between the basic bitmap index (without compression) and the list of Row Identifiers (RID-list).

When the column cardinality is low, each leaf node of the B-tree would contain long list of RIDs.

Applying this access strategy to B-tree indexes can also combine range queries on multiple columns.

In this approach, a temporary in-memory bitmap is created with one bit for each row in the table (1 MB can thus store over 8 million entries).