Paxos (computer science)

[1] Consensus protocols are the basis for the state machine replication approach to distributed computing, as suggested by Leslie Lamport[2] and surveyed by Fred Schneider.

Although no deterministic fault-tolerant consensus protocol can guarantee progress in an asynchronous network (a result proved in a paper by Fischer, Lynch and Paterson[6]), Paxos guarantees safety (consistency), and the conditions that could prevent it from making progress are difficult to provoke.

In 1988, Lynch, Dwork and Stockmeyer had demonstrated the solvability of consensus in a broad family of "partially synchronous" systems.

[7] Paxos has strong similarities to a protocol used for agreement in "viewstamped replication", first published by Oki and Liskov in 1988, in the context of distributed transactions.

[8] Notwithstanding this prior work, Paxos offered a particularly elegant formalism, and included one of the earliest proofs of safety for a fault-tolerant distributed consensus protocol.

Most reliable multicast protocols lack these properties, which are required for implementations of the state machine replication model.

[10] Paxos protocols are members of a theoretical class of solutions to a problem formalized as uniform agreement with crash failures.

[11] Derecho,[12] a C++ software library for cloud-scale state machine replication, offers a Paxos protocol that has been integrated with self-managed virtually synchronous membership.

However, using reconfiguration, a protocol may be employed which survives any number of total failures as long as no more than F fail simultaneously.

This is supported by the Fischer Lynch Paterson impossibility result (FLP)[6] which states that a consistency protocol can only have two of safety, liveness, and fault tolerance.

[17] This reduces the message complexity significantly, without sacrificing correctness: In Paxos, clients send commands to a leader.

[16]By merging roles, the protocol "collapses" into an efficient client-master-replica style deployment, typical of the database community.

[18] The benefit of the Paxos protocols (including implementations with merged roles) is the guarantee of its safety properties.

Each "instance" (or "execution") of the basic Paxos protocol decides on a single output value.

[20] Because of the agreement and validity guarantees of Paxos, if accepted by a Quorum, then the Proposer is now known to be the leader to all other nodes.

Some cases show how the Basic Paxos protocol copes with the failure of certain (redundant) components of the distributed system.

In the diagram below, there is 1 Client, 1 Proposer, 3 Acceptors (i.e. the Quorum size is 3) and 2 Learners (represented by the 2 vertical lines).

A typical deployment of Paxos requires a continuous stream of agreed values acting as commands to a distributed state machine.

In the following diagram, only one instance (or "execution") of the basic Paxos protocol, with an initial Leader (a Proposer), is shown.

A common deployment of the Multi-Paxos consists in collapsing the role of the Proposers, Acceptors and Learners to "Servers".

"[22]An example involving three main acceptors, one auxiliary acceptor and quorum size of three, showing failure of one main processor and subsequent reconfiguration: Fast Paxos generalizes Basic Paxos to reduce end-to-end message delays.

Fast Paxos allows 2 message delays, but requires that (1) the system be composed of 3f+ 1 acceptors to tolerate up to f faults (instead of the classic 2f+1), and (2) the Client to send its request to multiple destinations.

[16] The main discovery involves optimizations of Paxos when conflicting proposals could be applied in any order.

In order to illustrate Generalized Paxos, the example below shows a message flow between two concurrently executing clients and a replicated state machine implementing read/write operations over two distinct registers A and B.

The above message flow shows us that Generalized Paxos can leverage operation semantics to avoid collisions when the spontaneous ordering of the network fails.

However, when a collision occurs, Generalized Paxos needs two additional round trips to recover.

[26] Byzantine Paxos[27] introduced by Castro and Liskov adds an extra message (Verify) which acts to distribute knowledge and verify the actions of the other processors: Fast Byzantine Paxos[28] introduced by Martin and Alvisi removes this extra delay, since the client sends commands directly to the Acceptors.

If this does not occur, the Acceptors themselves will also be aware of it (since they exchanged each other's messages in the broadcast round), and correct Acceptors will re-broadcast the agreed value: With the emergence of very high speed reliable datacenter networks that support remote DMA (RDMA), there has been substantial interest in optimizing Paxos to leverage hardware offloading, in which the network interface card and network routers provide reliability and network-layer congestion control, freeing the host CPU for other tasks.

The Paxos protocols employed by Derecho needed to be adapted to maximize asynchronous data streaming and remove other sources of delay on the leader's critical path.

In high-speed RDMA networks, even small delays can be large enough to prevent utilization of the full potential bandwidth.