140 阿裏雲技術社區[雲棲]

分布式數據庫

1. 分布式數據庫領域CAP理論

Consistency(一致性), 數據一致更新，所有數據變動都是同步的
Availability(可用性), 好的響應性能
Partition tolerance(分區容錯性) 可靠性，A single piece of data is stored in 3 nodes, 1 node failed, the other 2 nodes can still work. This is implemented via Replication or Duplication. 也就是沒有單點失敗

定理：任何分布式係統隻可同時滿足二點，沒法三者兼顧。
忠告：架構師不要將精力浪費在如何設計能滿足三者的完美分布式係統，而是應該進行取舍。

2. 為什麼Partition Tolerance is mandatory？

串行係統 VS 並行係統（Partition Tolerance）的可用性對比。

對於應用服務器，並行意味著多台相同的應用服務器cluster，通常在cluster前端配置有load balance，這個cluster在eBay中叫pool
對於數據庫服務器，並行意味著熱備份的多台數據庫服務器（Replication)，一般至少有兩台（master，failover server）

一個大係統一般都有超過 30 個環節（串行）：如果每個環節都做到 99% 的準確率，最終係統的準確率是 74%; 如果每個環節都做到98%的準確率，最終係統的準確率 54%。

如果是並行係統，準確率如下麵formula:

P(any failure) = 1 – P(individual node not failing)^{number of nodes}

如係統中每個模塊的準確率是70%，那麼3個模塊並行，整體準確率=1-0.3^3=97.3%,如果是4個並行，準確率=1-0.3^4=99.19%,我在想這就是負載均衡靠譜的數學原理

5個9或6個9的QoS一定是指數思維的結果，線性思維等於送死

Reference: https://blog.sina.com.cn/s/blog_5459f60d01016ntb.html

3. 為什麼在PT是必須的前提下，Consistency and Availability 二者隻能選其一？

You cannot, however, choose both consistency and availability in a distributed system.

As a thought experiment, imagine a distributed system which keeps track of a single piece of data using three nodes—A, B, and C—and which claims to be both consistent and available in the face of network partitions. Misfortune strikes, and that system is partitioned into two components: {A,B} and {C}. In this state, a write request arrives at node C to update the single piece of data.

That node only has two options:

Accept the write, knowing that neither A nor B will know about this new data until the partition heals.
Refuse the write, knowing that the client might not be able to contact A or B until the partition heals.

You either choose availability (Door #1) or you choose consistency (Door #2). You cannot choose both.

Refrence: https://codahale.com/you-cant-sacrifice-partition-tolerance/

4.分布式數據庫的優缺點

優點：

提高係統的可靠性、可用性當某一場地出現故障時，係統可以對另一場地上的相同副本進行操作，不會因一處故障而造成整個係統的癱瘓。
提高係統性能係統可以根據距離選擇離用戶最近的數據副本進行操作，減少通信代價，改善整個係統的性能。
易於擴展，如果服務器軟件支持透明的水平擴展，那麼就可以增加多個服務器來進一步分布數據和分擔處理任務。（關於水平擴展可以參考https://xuezhongfeicn.blog.163.com/blog/static/22460141201201153456711/， eBay的數據庫存儲就是水平擴展的）

缺點：

事務管理的性能比在集中式數據庫花費更高，很難保證高度一致性
係統開銷大，主要花在通信部分。
複雜的存取結構，原來在集中式係統中有效存取數據的技術

最後更新：2017-04-02 15:28:25

分布式數據庫

上一篇：服務器設計筆記(1)-----定時器的實現(C++)

下一篇： POJ1830高消

相關內容

熱門內容

最新內容

分布式數據庫

上一篇： 服務器設計筆記(1)-----定時器的實現(C++)

下一篇： POJ1830高消

相關內容

熱門內容

最新內容

上一篇：服務器設計筆記(1)-----定時器的實現(C++)