閱讀56 返回首頁    go 阿裏雲 go 技術社區[雲棲]


讀書筆記:In Search of an Understandable Consensus Algorithm

在被Paxos的The Part-Time Parliament傷了之後,我自然而然就準備來看一下大家都說容易理解的Raft協議,也就是這篇In Search of an Understandable Consensus Algorithm,來給自己的智商重新衝值。

這篇論文開宗明義就說了:"Raft is a consensus algorithmfor managing a replicated log. It produces a result equivalent to Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election and log replication, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety. Results from a user study demonstrate that Raft is easier for students to learn than Paxos." 

裏麵除了說Raft功能和Paxos一樣,可是更容易理解,人見人愛之外。關鍵是這句: In order to enhance understandability, Raft separates the key elements of consensus, such as leader election and log replication, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. 意思是:為了提升可理解性,Raft 將一致性算法分解成了幾個關鍵模塊,例如領導人選舉、日誌複製和安全性。同時它通過實施一個更強的一致性來減少需要考慮的狀態的數量。

後麵這篇論文的內容大致就是講一段自己的理論就來一段Raft協議是多麼容易理解或者Paxos是多麼不好理解的論述。最後還搞了一個針對Paxos及Raft 算法的可理解能力的測試。整個方案是這樣的:“為了和 Paxos 比較 Raft 算法的可理解能力,我們針對高層次的本科生和研究生,在斯坦福大學的高級操作係統課程和加州大學伯克利分校的分布式計算課程上,進行了一次學習的實驗。我們分別拍了針對 Raft 和 Paxos 的視頻課程,並準備了相應的小測驗。Raft 的視頻講課覆蓋了這篇論文的所有內容除了日誌壓縮;Paxos 講課包含了足夠的資料來創建一個等價的複製狀態機,包括單決策 Paxos,多決策 Paxos,重新配置和一些實際係統需要的性能優化(例如領導人選舉)。小測驗測試一些對算法的基本理解和解釋一些邊角的示例。每個學生都是看完第一個視頻,回答相應的測試,再看第二個視頻,回答相應的測試。大約有一半的學生先進行 Paxos 部分,然後另一半先進行 Raft 部分,這是為了說明兩者獨立的區別從第一個算法處學來的經驗。我們計算參加人員的每一個小測驗的得分來看參與者是否在 Raft 算法上更加容易理解。”為了增加說服了作者還用線性回歸模型對學生成績數據進行了統計,最後結果當然是在可理解性上Raft勝出。

Raft協議之所以容易被理解是因為在Raft協議進行了算法分解(Raft 主要被分成了領導人選舉,日誌複製和安全三個模塊)和減少狀態機的狀態(相對於 Paxos,Raft 減少了非確定性和服務器互相處於非一致性的方式)。具體實現上有以下幾個獨特的特性:
Design for understandability(易於理解的設計): understandability was our most important criterion in evaluating design alternatives. We applied specific techniques to improve understandability, including decomposition (Raft separates leader election, log replication, and safety so that they can be understood relatively independently) and state space reduction (Raft reduces the degree of nondeterminism and the ways servers can be inconsistent with each other, in order to make it easier to reason about the system). 
Strong leader(強領導者): Raft differs from other consensus algorithms in that it employs a strong form of leadership where only leaders (or would-be leaders) issue requests; other servers are completely passive. This makes Raft easier to understand and also simplifies the implementation. 
Membership changes(關係調整): Raft’s mechanism for changing the set of servers in the cluster uses a simple joint consensus approach where the majorities of two different configurations overlap during transitions.

因為強領導者,Raft可以將一致性問題分解成了三個相對獨立的子問題:
Leader election(領導選舉): a newleadermust be chosenwhen an existing leader fails, and Raft must guarantee that exactly one leader is chosen (Section 5.2). 
Log replication(日誌複製): the leader must accept log entries from clients and replicate them faithfully across the cluster, forcing all other logs to agree with its own (Section 5.3). 
Safety(安全性): Section5.4describeshowRaft decideswhen a log entry has been committed, and how it ensures that leaders always hold all committed entries in their logs.

後續的章節對這幾個問題的具體實現進行了詳細的描述,甚至連實現的函數的定義都給了出來(RequestVote RPC和AppendEntries RPC)。整個論文可以說通俗易懂,大家可以從頭櫓到尾。
論文下載地址:https://fi.ee.tsinghua.edu.cn/media/files/2016/11/168263a78850135205a978e7230056d2/raft.pdf

最後更新:2017-08-22 02:32:17

  上一篇:go  新零售業務中台設計及產品體係解決方案
  下一篇:go  ng2腳手架安裝流程