Spring-Hadoop項目

作為Java攻城獅，以後不免要和Spring結下很深的情節，正式工作之後一定會基於Spring的許多內容做一係列的開發。還好，讓我發現了基於Spring，已經有了和我喜歡的Hadoop相結合的項目。

在今年三月份的時候，Vmware發布了Spring Hadoop software，在Spring框架下支持編寫 Hadoop 應用。Spring-hadoop這個項目應該是在Spring Data項目的一部分（Srping data其餘還包括把Spring和JDBC，REST，主流的NoSQL結合起來了）。其實再一想，Spring和Hadoop結合會發生什麼呢，其實就是把Hadoop組件的配置，任務部署之類的東西都統一到Spring的bean管理裏去了。

開門見山

話不多說，先來個例子看看吧。MapReduce裏有個類似與"Hello World"的example，就是"Word Count"，在Spring Hadoop裏，它長這樣：

<!-- configure Apache Hadoop FS/job tracker using defaults -->
<hdp:configuration />
 
<!-- define the job -->
<hdp:job 
  input-path="/input/" output-path="/ouput/"
  mapper="org.apache.hadoop.examples.WordCount.TokenizerMapper"
  reducer="org.apache.hadoop.examples.WordCount.IntSumReducer"/>
 
<!-- execute the job -->
<bean  
                  p:jobs-ref="word-count"/>

可以看到任務的參數配置和提交都由IoC容器來管理。Mapper和Reducer裏需要額外參數的話，也可以進行配置。

同時，Spring Hadoop並不要求MapReduce程序必須由Java編寫，你用別的語言編寫的Streaming job都可以無縫結合在Spring配置裏跑起來，這些jobs都是objects，對於Spring來說，都是beans

<hdp:streaming 
  input-path="/input/" output-path="/ouput/"
  mapper="${path.cat}" reducer="${path.wc}">
  <hdp:cmd-env>
    EXAMPLE_DIR=/home/example/dictionaries/
  </hdp:cmd-env>
</hdp:streaming>

此外現有的其他的Hadoop實現工具也支持。比如下麵這個Twitter的Scalding（它是一個用來寫MapReduce任務的Scala庫）

<!-- the tool automatically is injected with 'hadoop-configuration' -->
<hdp:tool-runner  tool->
   <hdp:arg value="tutorial/Tutorial1"/>
   <hdp:arg value="--local"/>
</hdp:tool-runner>

關鍵特性

- Spring Hadoop支持MapReduce、Streaming、Hive、Pig和級聯工作能夠通過Spring容器執行。
- HDFS的數據訪問能通過JVM支持的腳本語言，如Groovy，JRuby，Jython等等。
- 支持聲明式配置HBase
- 對於客戶端連接Hadoop，提供強大的Hadoop配置選項和模板機製
- 還計劃支持Hadoop工具，包括FsShell和DistCp等。

總之能把Hadoop各成員的配置，創建都和Spring的容器結合起來，得到統一的管理。

繼續例子

再來看幾個代表性的例子。

HBase和Pig：

<!-- HBase configuration with nested properties -->
<hdp:hbase-configuration stop-proxy="false" delete-connection="true">
    foo=bar
</hdp:hbase-configuration>
 
<!-- create a Pig instance using custom properties
    and execute a script (using given arguments) at startup -->
 
<hdp:pig properties-location="pig-dev.properties" />
   <script location="org/company/pig/script.pig">
     <arguments>electric=tears</arguments>
   </script>
</hdp:pig>

Hive的：

<!-- basic Hive driver bean -->
<bean  />
 
<!-- wrapping a basic datasource around the driver -->
<bean 
    
    c:driver-ref="hive-driver" c:url="${hive.url}"/>
 
<!-- standard JdbcTemplate declaration -->
<bean  
    c:data-source-ref="hive-ds"/>

下麵例子是用Groovy進行HDFS上的文件操作，目的是說明能將JVM支持的語言同HDFS進行交互操作。

<hdp:script language="groovy">
  inputPath = "/user/gutenberg/input/word/"
  outputPath = "/user/gutenberg/output/word/"
 
  if (fsh.test(inputPath)) {
    fsh.rmr(inputPath)
  }
 
  if (fsh.test(outputPath)) {
    fsh.rmr(outputPath)
  }
 
  fs.copyFromLocalFile("data/input.txt", inputPath)
</hdp:script>

總結

更具體的說明和使用可以參看github上的spring-hadoop項目。

（全文完）

最後更新：2017-04-04 07:03:06

Spring-Hadoop項目

上一篇： QT 5.0 正式版發布，支持 C++11

下一篇： redis持久化機製

相關內容

熱門內容

最新內容