閱讀608 返回首頁    go 阿裏雲 go 技術社區[雲棲]


讀書筆記:Large-scale cluster management at Google with Borg

  名篇,講的Google自用的調度平台Borg,我感覺也是Google的釣魚論文,當年大家知道有Borg的時候,好多人在各種地方唿籲Google把Borg開源,或者再詳細的講講細節。結果Google趁勢推出Kubernetes,“Borg雖然不開源,可是俺們開源了在這個基礎上研發的更新、更通用的Kubernetes啊,大家快來用啊啊啊啊啊啊啊啊”。 Kubernetes於是大火。
   Borgs 最NB的地方是同時跑Long-running service和batch jobs, 這樣據該論文所說會提高大概20~30%的效率,很NB的。他的原話是:“Since many other organizations run user-facing and batch jobs in separate clusters, we examined what would happen if we did the same. Figure 5 shows that segregating prod and non-prod work would need 20–30% more machines in the median cell to run our workload.” 大意就是別人都是把麵向用戶和批處理Job分開在不同的機群裏跑,我們也試了一下這麼會怎麼樣。我們一試,結果哎呀媽呀,要多用20~30%的機器才行。
目的    
    Google'sBorg system is a cluster manager that runs hundreds of thousands of jobs, frommany thousands of different applications, across a number of clusters each withup to tens of thousands of machines.
 
好處:
   1, Hides the detail of resources management and failure handings 
   2, operates with very haigh relability and availability and supportsapplications that do the same
   3, lets user run workloads accross tens of thousands o machines. 

概念:
1,Borg cell: a set of machines that are managed as a unit.
2,Workload: Borg cells run a heterogenous workload withtwo main parts.
Thefirst is long-running services that should “never” go down, and handleshort-lived latency-sensitive requests (a few ms to a few hundred ms). Suchservices are used for end-user-facing products such as Gmail, Google Docs, andweb search, and for internal infrastructure services (e.g., BigTable).
Thesecond is batch jobs that take from a few seconds to a few days to complete;these are much less sensitive to short-term performance fluctuations.
3,Cluster: The machines in a cell belong to a singlecluster, defined by the high-performance datacenter-scale network fabric thatconnects them. A cluster lives inside a single datacenter building, and acollection of buildings makes up a site.
4,Jobs:A Borg job’s properties include its name, owner, andthe number of tasks it has. Jobs can have constraints to force its tasks to runon machines with particular attributes such as processor architecture, OSversion, or an external IP address.
5,Task:  Each task maps to aset of Linux processes running in a container on a machine
6,Alloc: A Borg alloc (short for allocation) is areserved set of resources on a machine in which one or more tasks can be run;the resources remain assigned whether or not they are used.
7,Quota:Quota is used to decide which jobs to admit forscheduling. Quota is expressed as a vector of resource quantities (CPU, RAM,disk, etc.) at a given priority, for a period of time (typically months).
 
架構
Borgmaster:Each cell’s Borgmaster consists of two processes: themain Borgmaster process and a separate scheduler (x3.2). The main Borgmasterprocess handles client RPCs that either mutate state (e.g., create job) orprovide read-only access to data (e.g., lookup job). It also manages statemachines for all of the objects in the system (machines, tasks, allocs, etc.),communicates with the Borglets, and offers a web UI as a backup to Sigma.
 
Scheduling:When a job is submitted, the Borgmaster records itpersistently in the Paxos store and adds the job’s tasks to the pending queue.This is scanned asynchronously by the scheduler, which assigns tasks tomachines if there are sufficient available resources that meet the job’sconstraints. (The scheduler primarily operates on tasks, not jobs.)
 
Borglet:The Borglet is a local Borg agent that is present onevery machine in a cell. It starts and stops tasks; restarts them if they fail;manages local resources by manipulating OS kernel settings; rolls over debuglogs; and reports the state of the machine to the Borgmaster and othermonitoring systems.
 
一些小細節
The vastmajority of the Borg workload does not run inside virtual machines
Borgwrites the task's hostname and port into a consistent. highly-available file inChubby
Allcomponents of Borg are written in c++
A keydesign feature in Borg is that already-running tasks continue to run even ifthe Borgmaster or a task's Borglet goes down.

性能
各種NB.

最後推K8S的廣告:
The Kubernetes architecture goes further: it has an API server at its core that is responsible only for processing requests and manipulating the underlying state objects. The cluster management logic is built as small, composable micro-services that are clients of this API server, such as the replication controller, which maintains the desired number of replicas of a pod in the face of failures, and the node controller, which manages the machine lifecycle.  

最後更新:2017-08-19 01:33:18

  上一篇:go  Docker企業版17.06探秘
  下一篇:go  讀書筆記:Apache Hadoop YARN: Yet Anothe Resource Negotiator