EMR集群上capacity scheduler的ACL實現
背景
前麵一篇介紹了yarn的capacity scheduler原理,實驗了在EMR集群上使用capacity scheduler對集群資源的隔離和quota的限製。本文會介紹EMR集群上capacity scheduler的ACL實現。
為什麼要做這個?前麵給集群分配的資源分配了多個隊列,以及每個隊列的資源配比和作業調度的優先級。如果多租戶裏麵的每個都按照約定,各自往自己對應的隊列裏麵提交作業,自然沒有問題。但是如果用戶熟悉capacity scheduler的操作和原理,也是可以占用別組的資源隊列。所有有了capacity scheduler的ACL設置。
關鍵參數
- yarn.scheduler.capacity.queue-mappings
- 指定用戶和queue的映射關係。默認用戶上來,不用指定queue參數就能直接到對應的queue。這個比較方便,參數的格式為:
[u|g]:[name]:[queue_name][,next mapping]*
- 指定用戶和queue的映射關係。默認用戶上來,不用指定queue參數就能直接到對應的queue。這個比較方便,參數的格式為:
- yarn.scheduler.capacity.root.{queue-path}.acl_administer_queue
- 指定誰能管理這個隊列裏麵的job,英文解釋為
The ACL of who can administer jobs on the default queue.
星號*
表示all,一個空格表示none;
- 指定誰能管理這個隊列裏麵的job,英文解釋為
- yarn.scheduler.capacity.root.{queue-path}.acl_submit_applications
- 指定誰能提交job到這個隊列,英文解釋是
The ACL of who can administer jobs on the queue.
星號*
表示all,一個空格表示none;
- 指定誰能提交job到這個隊列,英文解釋是
EMR集群上具體操作步驟
- 創建EMR集群
- 修改相關配置來支持queue acl
- yarn-site:
yarn.acl.enable=true
- mapred-site:
mapreduce.cluster.acls.enabled=true
- hdfs-site:
dfs.permissions.enabled=true
這個跟capacity scheduler queue的acl沒什麼關係,是控製hdfs acl的,這裏一並設置了 - hdfs-site:
mapreduce.job.acl-view-job=*
如果配置了dfs.permissions.enabled=true
,就需要配置一下這個,要不然在hadoop ui上麵沒發查看job信息
- yarn-site:
-
重啟yarn和hdfs,使配置生效(root賬戶)
su -l hdfs -c '/usr/lib/hadoop-current/sbin/stop-dfs.sh'
su -l hadoop -c '/usr/lib/hadoop-current/sbin/stop-yarn.sh'
su -l hdfs -c '/usr/lib/hadoop-current/sbin/start-dfs.sh'
su -l hadoop -c '/usr/lib/hadoop-current/sbin/start-yarn.sh'
su -l hadoop -c '/usr/lib/hadoop-current/sbin/yarn-daemon.sh start proxyserver'
修改capacity scheduler配置
完整配置
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.25</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>a,b,default</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>20</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.capacity</name>
<value>30</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.capacity</name>
<value>50</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_submit_applications</name>
<value> </value>
<description>
The ACL of who can submit jobs to the root queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.acl_submit_applications</name>
<value>root</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.acl_submit_applications</name>
<value>hadoop</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>root</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_administer_queue</name>
<value> </value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>root</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.acl_administer_queue</name>
<value>root</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.acl_administer_queue</name>
<value>root</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>40</value>
<description>
Number of missed scheduling opportunities after which the CapacityScheduler
attempts to schedule rack-local containers.
Typically this should be set to number of nodes in the cluster, By default is setting
approximately number of nodes in one rack which is 40.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings</name>
<value>u:hadoop:b,u:root:a</value>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
<value>false</value>
<description>
If a queue mapping is present, will it override the value specified
by the user? This can be used by administrators to place jobs in queues
that are different than the one specified by the user.
The default is false.
</description>
</property>
</configuration>
上麵的配置,分配了三個隊列和對應的資源配比,設置用戶hadoop默認(不指定隊列的時候)往b隊列提,root默認往a隊列提。同時hadoop隻能往b隊列提交作業,root可以往所有隊列提交作業。其它用戶沒有權限提交作業。
踩過的坑
- acl_administer_queue的配置
- 配置中支持兩種操作的acl權限配置
acl_administer_queue
和acl_submit_applications
。按照語意,如果要控製是否能提交作業,隻要配置隊列的acl_submit_applications
屬性即可,按照文檔,也就是這個意思。但是其實不是的,隻要有administer權限的,就能提交作業。這個問題查了好久,找源碼才找到。
- 配置中支持兩種操作的acl權限配置
@Override
public void submitApplication(ApplicationId applicationId, String userName,
String queue) throws AccessControlException {
// Careful! Locking order is important!
// Check queue ACLs
UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(userName);
if (!hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi)
&& !hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
throw new AccessControlException("User " + userName + " cannot submit" +
" applications to queue " + getQueuePath());
}
- root queue的配置
- 如果要限製用戶對queue的權限root queue一定要設置,不能隻設置leaf queue。因為權限是根權限具有更高的優先級,看代碼注釋說:
// recursively look up the queue to see if parent queue has the permission
。這個跟常人理解也b不一樣。所以需要先把把的權限限製住,要不然配置的各種自隊列的權限根本沒有用。
- 如果要限製用戶對queue的權限root queue一定要設置,不能隻設置leaf queue。因為權限是根權限具有更高的優先級,看代碼注釋說:
<property>
<name>yarn.scheduler.capacity.root.acl_submit_applications</name>
<value> </value>
<description>
The ACL of who can submit jobs to the root queue.
</description>
</property>
最後更新:2017-05-15 10:03:16
上一篇:
踩坑CBO,解決那些坑爹的SQL優化問題
下一篇:
ODPS Studio 2.6.2 版本發布了
Java設計模式中單例設計模式
展開相關人物 小(IT/黑(中華人民共和國稅收征收管理法(主席令第四十九號) 2015年8月15日 - 會關於修改〈中華人民共和國文物保護法〉等十二部法律的決定》(主席令第...第八十九條 納稅人、扣繳義務人可以委托稅務代理人代為辦理稅務事宜。 第...)
Oracle同義詞創建及其作用
阿裏內核月報2014年4月
VMware資源池分配的誤區
近期很火的PHOTOSHOP特效教程集合
Arbor蔡誌剛:企業需要防DDoS攻擊能力
知識擴展——為什麼Windows的第一個盤叫C盤,不叫A盤?
Erlang入門(五)——補遺
程序猿(媛)們注意啦!Git、SVN、Mercurial版本控製係統被爆遠程命令執行漏洞