DRF算法

背景

在Mesos和YARN中，都用到了dominant resource fairness算法(DRF)，它不同於hadoop基於slot-based實現的fair scheduler和capacity scheduler，論文閱讀：Dominant Resource Fairness: Fair Allocation of Multiple Resource Types。
考慮在一個包括多種資源類型(主要考慮CPU和MEM)的係統的公平資源分配問題，其中不同用戶對資源有不同的需求。為了解決這個問題，伯克利的幾位大牛提出了Dominant Resource Fairness(DRF)，一種針對不同資源類型的max-min fairness。並且在Mesos的設計和實現中評估了DRF，顯示了它可以比slot-based 公平調度算法得到更好的吞吐量。

介紹

在任何共享的計算機係統中，資源分配都是一個關鍵的構建模塊。已經提出的最通用的分配策略是max-min fairness，這種策略會最大化係統中一個用戶收到的最小分配。假設每一個用戶都有足夠地請求，這種策略會給予每個用戶一份均等的資源。廣義的Max-min fairness包括權重(weight)的概念，用戶可以獲得與它的權重成正比的那一份資源。
加權max-min fairness的吸引力源於它的一般性和它能提供性能隔離的能力。加權max-min fairness模型可以支撐許多其他資源分配策略，包括優先級、deadline based allocation等。此外，加權max-min fairness確保隔離，一個用戶能確保收到它的那份資源，而不管其它用戶的需求。
鑒於這些特征，不同精確度的加權或非加權max-min fairness毫不意外地被大量已經提出的算法實現，如round-robin、proportional resource sharing、weighted fair queueing。這些算法已經被應用於不同的資源，包括鏈路帶寬、CPU、內存、存儲。
盡管已經在公平分配上做了大量地工作和實踐，但目前為止主要還是集中在單資源類型的場景下。甚至在多資源類型的環境下，用戶有異質資源請求，典型的資源分配做法還是使用單類型資源抽象。比如hadoop和Dryad的公平調度器，這兩種廣泛使用集群計算框架，在資源分配時使用插槽（slot），slot就是對節點資源按照固定大小進行劃分而產生的分區。然而事實是集群中不同的作業隊CPU、內存和IO資源有著不同的需求。

特性

DRF是一種通用的多資源的max-min fairness分配策略。DRF背後的直觀想法是在多環境下一個用戶的資源分配應該由用戶的dominant share（主導份額的資源）決定，dominant share是在所有已經分配給用戶的多種資源中，占據最大份額的一種資源。簡而言之，DRF試圖最大化所有用戶中最小的dominant share。
舉個例子，假如用戶A運行CPU密集的任務而用戶B運行內存密集的任務，DRF會試圖均衡用戶A的CPU資源份額和用戶B的內存資源份額。在單個資源的情形下，那麼DRF就會退化為max-min fairness。
DRF有四種主要特性，分別是：sharing incentive、strategy-proofness、Pareto efficiency和envy-freeness。
DRF是通過確保係統的資源在用戶之間是靜態和均衡地進行分配來提供sharing incentive，用戶不能獲得比其他用戶更多的資源。此外，DRF是strategy-proof，用戶不能通過謊報其資源需求來獲得更多的資源。DRF是Pareto-efficient，在滿足其他特性的同時，分配所有可以利用的資源，不用取代現有的資源分配。最後，DRF是envy-free，用戶不會更喜歡其他用戶的資源分配。

最大最小算法

在解釋DRF的計算方式之前，了解一下max-min算法的公平分配策略會更有幫助。例子來源於https://hi.baidu.com/dd_hz/item/4a0b75e970afcfcebbf37d8d這篇文章

不加權

有一四個用戶的集合,資源需求分別是2,2.6,4,5,其資源總能力為10,為其計算最大最小公平分配
解決方法:
我們通過幾輪的計算來計算最大最小公平分配.
第一輪,我們暫時將資源劃分成4個大小為2.5的.由於這超過了用戶1的需求,這使得剩了0.5個均勻的分配給剩下的3個人資源,給予他們每個2.66.這又超過了用戶2的需求,所以我們擁有額外的0.066…來分配給剩下的兩個用戶,給予每個用戶2.5+0.66…+0.033…=2.7.因此公平分配是:用戶1得到2,用戶2得到2.6,用戶3和用戶4每個都得到2.7.

加權

有一四個用戶的集合,資源需求分別是4,2,10,4,權重分別是2.5,4,0.5,1,資源總能力是16,為其計算最大最小公平分配.
解決方法:
第一步是標準化權重,將最小的權重設置為1.這樣權重集合更新為5,8,1,2.這樣我們就假裝需要的資源不是4份而是5+8+1+2=16份.因此將資源劃分成16份.在資源分配的每一輪,我們按照權重的比例來劃分資源,因此,在第一輪,我們計算c/n為16/16=1.在這一輪,用戶分別獲得5,8,1,2單元的資源,用戶1得到了5個資源,但是隻需要4,所以多了1個資源,同樣的,用戶2多了6個資源.用戶3和用戶4拖欠了,因為他們的配額低於需求.現在我們有7個單元的資源可以分配給用戶3和用戶4.他們的權重分別是1和2,最小的權重是1,因此不需要對權重進行標準化.給予用戶3額外的7 × 1/3單元資源和用戶4額外的7 × 2/3單元.這會導致用戶4的配額達到了2 + 7 × 2/3 = 6.666,超過了需求.所以我們將額外的2.666單元給用戶3,最終獲得1 + 7/3 + 2.666 = 6單元.最終的分配是,4,2,6,4,這就是帶權的最大最小公平分配.

DRF計算方式

考慮一個有9個cpu和18GB的係統，有兩個用戶：用戶A的每個任務都請求（1CPU，4GB）資源；用戶B的每個任務都請求（3CPU，4GB）資源。如何為這種情況構建一個公平分配策略？

通過列不等式方程可以解得給用戶A分配3份資源，用戶B分配2份資源是一個很好的選擇。DRF的算法偽代碼為：

使用DRF的思路，分配的過程如下表所示，注意，每一次選擇為哪個資源分配的決定，取決於上次分配之後，目前dominant share最小的那個用戶可以得到下一次的資源分配。

在這個例子中，用戶A的CPU占總CPU 1/9，MEM占總MEM的 2/9，而用戶B的CPU占1/3，MEM占2/9，所以A的主資源為內存，B的主資源為CPU。基於這點，DRF會最大化A的內存的最小化分配，並會最大化B的CPU的最小分配。

Weighted DRF換算方式

帶權重的情況下，每個用戶的dominant share不再是那個占用總資源百分比最大的那位資源，而是定義為Si = max{Ui,j / Wi,j}，即那份占總權重最大的資源作為dominant share，代替上述例子中的dominant share進行運算。

YARN DRF實現

最後附上hadoop 2.0裏YARN的server-resourcemanager模塊裏，Fair Scheduler裏實現的DRF策略的代碼（YARN的Scheduler主要實現的是Capacity和Fair，DRF是Fair裏的一種，此外還有FIFO、Fair Share）。

/**
 * Makes scheduling decisions by trying to equalize dominant resource usage.
 * A schedulable's dominant resource usage is the largest ratio of resource
 * usage to capacity among the resource types it is using.
 */
@Private
@Unstable
public class DominantResourceFairnessPolicy extends SchedulingPolicy {

  public static final String NAME = "DRF";

  private DominantResourceFairnessComparator comparator =
      new DominantResourceFairnessComparator();

  @Override
  public String getName() {
    return NAME;
  }

  @Override
  public byte getApplicableDepth() {
    return SchedulingPolicy.DEPTH_ANY;
  }

  @Override
  public Comparator<Schedulable> getComparator() {
    return comparator;
  }
  
  @Override
  public void computeShares(Collection<? extends Schedulable> schedulables,
      Resource totalResources) {
    for (ResourceType type : ResourceType.values()) {
      ComputeFairShares.computeShares(schedulables, totalResources, type);
    }
  }
  
  @Override
  public void initialize(Resource clusterCapacity) {
    comparator.setClusterCapacity(clusterCapacity);
  }

  public static class DominantResourceFairnessComparator implements Comparator<Schedulable> {
    private static final int NUM_RESOURCES = ResourceType.values().length;
    
    private Resource clusterCapacity;

    public void setClusterCapacity(Resource clusterCapacity) {
      this.clusterCapacity = clusterCapacity;
    }

    @Override
    public int compare(Schedulable s1, Schedulable s2) {
      ResourceWeights sharesOfCluster1 = new ResourceWeights();
      ResourceWeights sharesOfCluster2 = new ResourceWeights();
      ResourceWeights sharesOfMinShare1 = new ResourceWeights();
      ResourceWeights sharesOfMinShare2 = new ResourceWeights();
      ResourceType[] resourceOrder1 = new ResourceType[NUM_RESOURCES];
      ResourceType[] resourceOrder2 = new ResourceType[NUM_RESOURCES];
      
      // Calculate shares of the cluster for each resource both schedulables.
      calculateShares(s1.getResourceUsage(),
          clusterCapacity, sharesOfCluster1, resourceOrder1, s1.getWeights());
      calculateShares(s1.getResourceUsage(),
          s1.getMinShare(), sharesOfMinShare1, null, ResourceWeights.NEUTRAL);
      calculateShares(s2.getResourceUsage(),
          clusterCapacity, sharesOfCluster2, resourceOrder2, s2.getWeights());
      calculateShares(s2.getResourceUsage(),
          s2.getMinShare(), sharesOfMinShare2, null, ResourceWeights.NEUTRAL);
      
      // A queue is needy for its min share if its dominant resource
      // (with respect to the cluster capacity) is below its configured min share
      // for that resource
      boolean s1Needy = sharesOfMinShare1.getWeight(resourceOrder1[0]) < 1.0f;
      boolean s2Needy = sharesOfMinShare2.getWeight(resourceOrder2[0]) < 1.0f;
      
      int res = 0;
      if (!s2Needy && !s1Needy) {
        res = compareShares(sharesOfCluster1, sharesOfCluster2,
            resourceOrder1, resourceOrder2);
      } else if (s1Needy && !s2Needy) {
        res = -1;
      } else if (s2Needy && !s1Needy) {
        res = 1;
      } else { // both are needy below min share
        res = compareShares(sharesOfMinShare1, sharesOfMinShare2,
            resourceOrder1, resourceOrder2);
      }
      if (res == 0) {
        // Apps are tied in fairness ratio. Break the tie by submit time.
        res = (int)(s1.getStartTime() - s2.getStartTime());
      }
      return res;
    }
    
    /**
     * Calculates and orders a resource's share of a pool in terms of two vectors.
     * The shares vector contains, for each resource, the fraction of the pool that
     * it takes up.  The resourceOrder vector contains an ordering of resources
     * by largest share.  So if resource=<10 MB, 5 CPU>, and pool=<100 MB, 10 CPU>,
     * shares will be [.1, .5] and resourceOrder will be [CPU, MEMORY].
     */
    void calculateShares(Resource resource, Resource pool,
        ResourceWeights shares, ResourceType[] resourceOrder, ResourceWeights weights) {
      shares.setWeight(MEMORY, (float)resource.getMemory() /
          (pool.getMemory() * weights.getWeight(MEMORY)));
      shares.setWeight(CPU, (float)resource.getVirtualCores() /
          (pool.getVirtualCores() * weights.getWeight(CPU)));
      // sort order vector by resource share
      if (resourceOrder != null) {
        if (shares.getWeight(MEMORY) > shares.getWeight(CPU)) {
          resourceOrder[0] = MEMORY;
          resourceOrder[1] = CPU;
        } else  {
          resourceOrder[0] = CPU;
          resourceOrder[1] = MEMORY;
        }
      }
    }
    
    private int compareShares(ResourceWeights shares1, ResourceWeights shares2,
        ResourceType[] resourceOrder1, ResourceType[] resourceOrder2) {
      for (int i = 0; i < resourceOrder1.length; i++) {
        int ret = (int)Math.signum(shares1.getWeight(resourceOrder1[i])
            - shares2.getWeight(resourceOrder2[i]));
        if (ret != 0) {
          return ret;
        }
      }
      return 0;
    }
  }
}

總結

本文介紹了DRF算法的背景和特性，並對比最大最小算法，介紹了DRF帶權和不帶權情況下的資源分配方式。DRF的核心就是最大化所有用戶中最小的dominant share，而dominant share在帶權和不帶權的情況下有不同的計算方式，其實計算都很簡單，通過YARN實現的代碼也能體現。

希望通過本文的介紹，能讓讀者了解DRF算法的重要性，其與max-min在思路上的繼承和優化，並能從YARN和Mesos的實際代碼模塊中，獲得實現思路。

最後更新：2017-04-03 12:55:06

DRF算法

背景

介紹

特性

最大最小算法

不加權

加權

DRF計算方式

Weighted DRF換算方式

YARN DRF實現

總結

上一篇：錯誤解決：[A potentially dangerous Request.Form value was detected from the client]

下一篇：刪除Outlook2010中默認的賬戶或默認的配置

相關內容

熱門內容

最新內容

DRF算法

背景

介紹

特性

最大最小算法

不加權

加權

DRF計算方式

Weighted DRF換算方式

YARN DRF實現

總結

上一篇： 錯誤解決：[A potentially dangerous Request.Form value was detected from the client]

下一篇： 刪除Outlook2010中默認的賬戶或默認的配置

相關內容

熱門內容

最新內容

上一篇：錯誤解決：[A potentially dangerous Request.Form value was detected from the client]

下一篇：刪除Outlook2010中默認的賬戶或默認的配置