Apache Parquet

The open-source project to build Apache Parquet began as a joint effort between Twitter[3] and Cloudera.

[4] Parquet was designed as an improvement on the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop.

[8] The values in each column are stored in contiguous memory locations, providing the following benefits:[9] Apache Parquet is implemented using the Apache Thrift framework, which increases its flexibility; it can work with a number of programming languages like C++, Java, Python, PHP, etc.

[13] Parquet implements a hybrid of bit packing and RLE, in which the encoding switches based on which produces the best compression results.

Apache Arrow is designed as an in-memory complement to on-disk columnar formats like Parquet and ORC.