Erasure code

The fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency.

Erasure coding was invented by Irving Reed and Gustave Solomon in 1960.

[1] As of 2023, modern data storage systems can be designed to tolerate the complete failure of a few disks without data loss, using one of 3 approaches:[2][3][4] While technically RAID can be seen as a kind of erasure code,[5] "RAID" is generally applied to an array attached to a single host computer (which is a single point of failure), while "erasure coding" generally implies multiple hosts,[3] sometimes called a Redundant Array of Inexpensive Servers (RAIS).

[4][6] Compared to block-level RAID systems, object storage erasure coding has some significant differences that make it more resilient.

, is erased, it can be easily recovered by summing the remaining variables: RAID 5 is a widely-used application of the parity check erasure code.

Err-mail works just like e-mail, except Instead of asking Bob to acknowledge the messages she sends, Alice devises the following scheme.

Bob can perform this procedure using any two err-mails, so the erasure code in this example has a rate of 40%.

For truly generic erasure codes that work over any data set, we would need something other than the f(i) given.

He then constructs a (Lagrange) polynomial p(x) of order k such that p(i) is equal to data symbol i.

So we have enough data points to construct r and evaluate it to find the lost packets.

Near-optimal erasure codes trade correction capabilities for computational complexity: practical algorithms can encode and decode with linear time complexity.

This issue occurs in distributed storage systems where communication to maintain encoded redundancy is a problem.

[12] Erasure coding is now standard practice for reliable data storage.

[13][14][15] In particular, various implementations of Reed-Solomon erasure coding are used by Apache Hadoop, the RAID-6 built into Linux, Microsoft Azure, Facebook cold storage, and Backblaze Vaults.

However, replication incurs significant overhead in terms of wasted bytes.

The most common form of erasure coding used in storage systems is Reed-Solomon (RS) code, an advanced mathematics formula used to enable regeneration of missing data from pieces of known data, called parity blocks.

Example: In RS (10, 4) code, which is used in Facebook for their HDFS,[16] 10 MB of user data is divided into ten 1MB blocks.

Then, four additional 1 MB parity blocks are created to provide redundancy.

Initially, erasure codes were used to reduce the cost of storing "cold" (rarely-accessed) data efficiently; but erasure codes can also be used to improve performance serving "hot" (more-frequently-accessed) data.