Apache Cassandra is a free and open-source database management system designed to handle large volumes of data across multiple commodity servers.
[2] As a wide-column database, Cassandra supports flexible schemas and efficiently handles data models with numerous sparse columns.
[2] Cassandra supports computer clusters which may span multiple data centers,[3] featuring asynchronous and masterless replication.
[4] Avinash Lakshman, a co-author of Amazon's Dynamo, and Prashant Malik developed Cassandra at Facebook to support the inbox search functionality.
[7] The developers at Facebook named their database after Cassandra, the mythological Trojan prophetess, referencing her curse of making prophecies that were never believed.
The system employs configurable replication strategies to distribute data across clusters, providing redundancy and disaster recovery capabilities.
The system is capable of linear scaling, which increases read and write throughput with the addition of new nodes, while maintaining continuous service.
CQL adds an abstraction layer that hides implementation details of this structure and provides native syntaxes for collections and other common encodings.
Below is an example of key space creation, including a column family in CQL 3.0:[15]Which gives: Cassandra uses a peer-to-peer gossip protocol for cluster communication.
Notably, Cassandra only permanently removes nodes through explicit administrative decommissioning or rebuilding, preventing temporary communication failures or restarts from triggering unnecessary data rebalancing.
[17] Nodetool also offers a number of commands to return Cassandra metrics pertaining to disk usage, latency, compaction, garbage collection, and more.