Consensus Algorithms and Fault-Tolerance in Distributed Systems (2)

In the previous article, I introduced Dr. Zhilin Hu’s view on ‘basic problems, principles and consensus algorithms of distributed systems.’ In this article, I will continue to bring you Dr. Hu’s share.

His sharing is for:

1. Differences between typical consensus algorithms.

2. Fault-tolerance analysis of consensus algorithm.

3. Application of consensus algorithm.

Comparison of the typical consensus algorithm

Paxos is the basic algorithm for the non-Byzantine issue in the classical pure asynchronous model. Almost all the algorithms in the traditional distributed system are variations of or derived from Paxos, as shown in figure 1.

Figure 1

According to FLP theory I mentioned in the previous article, it is impossible to have a deterministic algorithm to solve the consistency problem. But in fact, there are a lot of premises and assumptions on this issue. By properly applying these assumptions, we can select a node as the leader to ensure the normal operation of the Paxos algorithm in practical work.

The process is as follows:

The Leader node will send orders to each node, and each node will reply “accept” after receiving the requests. After receiving more than half of the ‘acceptance,’ the Leader node will formally submit the proposal, which is the process of consensus completion.

Figure 2

For the Byzantine problem, the PBFT can implement its functions, but its complexity is polynomial. Its consensus-building process is much complex because there are many malicious nodes. In order not to be affected by these malicious nodes, each node needs to broadcast its message, taking advantage of such large communication cost, to obtain the consistency of the final consistency.

Therefore, the most apparent difference between Paxos and PBFT is that the PBFT has a significant increase in traffic. If the traffic of the Paxos can be regarded as the complexity of N, while the traffic of the PBFT is the complexity of N².

Figure 3

The characteristic of typical POW algorithm is that it sacrifices strong consistency and adopts final consistency, which simplifies the communication between nodes.

During the Leader election, every miner considers itself a Leader. News will then be broadcast by miners who are first mine the block, and the block will spread across the network according to the Gossip protocol. When a new block is received, other nodes stop mining and verify that the new block is correct. If correct, they will continue to mine on the new block. All the miners could see the whole process.

The two algorithms I mentioned above basically keep the miner’s behavior consistent by using majority decisions. Unlike Pow, it relies on economic incentives. For miners, the benefits of continuing to mine from new blocks far outweigh the benefits of mining.

Therefore, Paxos and PBFT rely on the majority principle, while POW relies on the game theory.

Why are these algorithms so different in performance?

Because the performance depends on how it works, in the process, the PBFT complexity is N², and the Paxos complexity is N.

Figure 4

• In Paxos, we can see that the Leader sends a message to N nodes, and then N nodes reply to the Leader, and the message will be broadcast. The time spent on making each such decision is Nμi (receiving N inputs) +μb (broadcasting time) + μo (output time), and then we divide this result by 1 to get its TPS.

• For a Bitcoin POW, the size of the block and the speed at which the block is generated are fixed. We multiply the rate of block generating by the size of each block and divide by the average number of bytes per transaction to gets its TPS.

It can be found from the two simplified models that the performance of the traditional consensus algorithm will decline rapidly with the increase of the number of nodes, while POW can guarantee a relatively constant block generation rate.

When the number of nodes is large, POW may be the best choice for true decentralization. PBFT algorithms are better suited for projects that use a few supernodes, such as EOS.

Consensus algorithm fault-tolerance analysis

There must be malicious nodes in a system. The fault-tolerant function of blockchain is the same as traditional algorithms and pays more attention to how to resist various attacks. For the Paxos algorithm, it cannot tolerate more than 50% of the nodes crashing.

A classic paper ‘The Byzantine Generals Problem’ detailed two solutions to the Byzantine model. In the oral protocol mode, it can only provide one-third tolerance at most. For the written protocol, any number of malicious nodes can exist on the premise that all faithful nodes maintain consistent common behavior.

As mentioned earlier, Paxos has a tolerance limit of 50% and the Byzantium to 1/3 tolerance. So, is it possible to combine some of our practical applications to achieve higher fault tolerance in specific scenarios?

The founder of Ethereum came up with a solution, and he introduced some preconditions:

• First, he introduced strong synchronization;

• Second, he introduced observers;

• Third, he changed the original consensus algorithm to a combination of delay selection algorithm and thresholds consensus algorithm.

By introducing these conditions, even if there is only a single faithful node, it can achieve 99% fault tolerance under strong synchronization condition.

Application of consensus algorithm

Traditional distributed applications are mainly based on Paxos and Paxos-based algorithms, like Raft.

For the consortium chain, due to the small number of nodes, the traditional consensus algorithm will still be adopted in the future, such as PBFT or PBFT-based algorithm.

For the private chain, due to more controllable application scenarios, the range of the consensus algorithm that can be selected is relatively wide, such as Raft and Paxos.

Trustworthy and Reliable Intelligent Autonomous Systems