Storm Topology的並發度
Understanding the parallelism of a Storm topology
https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
概念
一個Topology可以包含一個或多個worker(並行的跑在不同的machine上), 所以worker process就是執行一個topology的子集, 並且worker隻能對應於一個topology
一個worker可用包含一個或多個executor, 每個component (spout或bolt)至少對應於一個executor, 所以可以說executor執行一個compenent的子集, 同時一個executor隻能對應於一個component
Task就是具體的處理邏輯對象, 一個executor線程可以執行一個或多個tasks
但一般默認每個executor隻執行一個task, 所以我們往往認為task就是執行線程, 其實不然
task代表最大並發度, 一個component的task數是不會改變的, 但是一個componet的executer數目是會發生變化的
當task數大於executor數時, executor數代表實際並發數
A worker process executes a subset of a topology.
A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology.
A running topology consists of many such processes running on many machines within a Storm cluster.
An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).
A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster.
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks
.
By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
Configuring the parallelism of a topology, 並發度的配置
The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them.
Storm currently has the following order of precedence for configuration settings:
defaults.yaml
< storm.yaml
< topology-specific configuration < internal component-specific configuration < external component-specific configuration
對於並發度的配置, 在storm裏麵可以在多個地方進行配置, 優先級如上麵所示...
具體包含,
worker processes的數目, 可以通過配置文件和代碼中配置, worker就是執行進程, 所以考慮並發的效果, 數目至少應該大於machines的數目
executor的數目, component的並發線程數,隻能在代碼中配置(通過setBolt和setSpout的參數), 例如, setBolt("green-bolt", new GreenBolt(), )
tasks的數目, 可以不配置, 默認和executor1:1, 也可以通過setNumTasks()配置
Number of worker processes
- Description: How many worker processes to create for the topology across machines in the cluster.
- Configuration option: TOPOLOGY_WORKERS
- How to set in your code (examples):
Number of executors (threads)
- Description: How many executors to spawn per component.
- Configuration option: ?
- How to set in your code (examples):
- TopologyBuilder#setSpout()
- TopologyBuilder#setBolt()
- Note that as of Storm 0.8 the
parallelism_hint
parameter now specifies the initial number of executors (not tasks!) for that bolt.
Number of tasks
- Description: How many tasks to create per component.
- Configuration option: TOPOLOGY_TASKS
- How to set in your code (examples):
Here is an example code snippet to show these settings in practice:
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping("blue-spout);
In the above code we configured Storm to run the bolt GreenBolt
with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.
Example of a running topology
The following illustration shows how a simple topology would look like in operation.
The topology consists of three components: one spout called BlueSpout
and two bolts called GreenBolt
and YellowBolt
.
The components are linked such that BlueSpout
sends its output to GreenBolt
, which in turns sends its own output toYellowBolt
.
Config conf = new Config(); conf.setNumWorkers(2); // use two worker processes topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2 topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) //set tasks number to 4 .shuffleGrouping("blue-spout"); topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6) .shuffleGrouping("green-bolt"); StormSubmitter.submitTopology( "mytopology", conf, topologyBuilder.createTopology() );
圖和代碼, 很清晰, 通過setBolt和setSpout一共定義2+2+6=10個executor threads
並且同setNumWorkers設置2個workers, 所以storm會平均在每個worker上run 5個executors
而對於green-bolt, 定義了4個tasks, 所以每個executor中有2個tasks
How to change the parallelism of a running topology, 動態的改變並發度
Storm支持在不restart topology的情況下, 動態的改變(增減)worker processes的數目和executors的數目, 稱為rebalancing.
通過Storm web UI, 或者通過storm rebalance命令, 見下麵的例子
A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.
You have two options to rebalance a topology:
- Use the Storm web UI to rebalance the topology.
- Use the CLI tool storm rebalance as described below.
Here is an example of using the CLI tool:
# Reconfigure the topology "mytopology" to use 5 worker processes,
# the spout "blue-spout" to use 3 executors and
# the bolt "yellow-bolt" to use 10 executors.
$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
本文章摘自博客園,原文發布日期:2013-05-04
最後更新:2017-05-18 20:37:14
上一篇:
現代化的緩存設計方案
下一篇:
Another URL Shortener using NodeJS
Dev Gridcontrol刪除數據後,獲取當前行數據時出錯
為什麼程序員一定要加班?
Java NIO係列教程(十) Java NIO DatagramChannel
Android 斷點續傳下載
centos 網易yum源出錯解決辦法 mark
使用Cocos2d-x製作三消類遊戲Sushi Crush(第一部分)
jdbc數據庫連接管理封裝工具類,不同使用屬性文件配置數據庫連接信息(2)
JAVA線程中的生產者和消費者問題
Print2Flash出現"System Error. Code:1722. RPC服務器不可用."錯誤解決辦法
tomcat的安裝使用