閱讀841 返回首頁    go iPhone_iPad_Mac_手機_平板_蘋果apple


Flume MaxCompute Sink插件__數據入雲_數據集成-阿裏雲

Apache Flume是一個分布式的、可靠的、可用的係統,可用於從不同的數據源中高效地收集、聚合和移動海量日誌數據到集中式數據存儲係統。

ODPS Sink是基於ODPS DataHub Service開發的Flume插件,可以將Flume的Event數據導入到ODPS中。插件兼容Flume的原有功能特性,支持ODPS表自定義分區、且可以自動創建分區。

二、環境要求

1、JDK(1.6以上,推薦1.7)

2、Flume-NG 1.x

三、插件部署

1、下載ODPS Sink插件並解壓:aliyun-odps-flume-plugin

2、部署ODPS Sink插件:將文件夾odps_sink移動到Apache Flume安裝目錄下:

  1. $ mkdir {YOUR_APACHE_FLUME_DIR}/plugins.d
  2. $mv odps_sink/ { YOUR_APACHE_FLUME_DIR }/plugins.d/

移動後,核驗ODPS Sink插件是否已經在相應目錄:

  1. $ ls { YOUR_APACHE_FLUME_DIR}/plugins.d
  2. odps_sink

部署完成後,隻需要在Flume的配置文件中將sink的type字段配置為:

  1. com.aliyun.odps.flume.sink.OdpsSink

即可使用

四、配置示例

例:將日誌文件中的結構化數據進行解析,並上傳到ODPS表中

需要上傳的日誌文件格式如下(每行為一條記錄,字段之間逗號分隔):

  1. #test_basic.log
  2. some,log,line1
  3. some,log,line2
  4. ...

第一步、在ODPS 的 project創建ODPS Datahub表

建表語句如下所示:

  1. CREATE TABLE hub_table_basic (col1 STRING, col2 STRING)
  2. PARTITIONED BY (pt STRING)
  3. INTO 1 SHARDS
  4. HUBLIFECYCLE 1;

第二步、創建Flume作業配置文件:

在Flume安裝目錄的conf/文件夾下創建名為odps_basic.conf的文件,並輸入內容如下:

  1. # odps_basic.conf
  2. # A single-node Flume configuration for ODPS
  3. # Name the components on this agent
  4. a1.sources = r1
  5. a1.sinks = k1
  6. a1.channels = c1
  7. # Describe/configure the source
  8. a1.sources.r1.type = exec
  9. a1.sources.r1.command = cat {YOUR_LOG_DIRECTORY}/test_basic.log
  10. # Describe the sink
  11. a1.sinks.k1.type = com.aliyun.odps.flume.sink.OdpsSink
  12. a1.sinks.k1.accessID = {YOUR_ALIYUN_ODPS_ACCESS_ID}
  13. a1.sinks.k1.accessKey = {YOUR_ALIYUN_ODPS_ACCESS_KEY}
  14. a1.sinks.k1.odps.endPoint = https://service.odps.aliyun.com/api
  15. a1.sinks.k1.odps.datahub.endPoint = https://dh.odps.aliyun.com
  16. a1.sinks.k1.odps.project = {YOUR_ALIYUN_ODPS_PROJECT}
  17. a1.sinks.k1.odps.table = hub_table_basic
  18. a1.sinks.k1.odps.partition = 20150814
  19. a1.sinks.k1.batchSize = 100
  20. a1.sinks.k1.serializer = DELIMITED
  21. a1.sinks.k1.serializer.delimiter = ,
  22. a1.sinks.k1.serializer.fieldnames = col1,,col2
  23. a1.sinks.k1.serializer.charset = UTF-8
  24. a1.sinks.k1.shard.number = 1
  25. a1.sinks.k1.shard.maxTimeOut = 60
  26. a1.sinks.k1.autoCreatePartition = true
  27. # Use a channel which buffers events in memory
  28. a1.channels.c1.type = memory
  29. a1.channels.c1.capacity = 1000
  30. a1.channels.c1.transactionCapacity = 1000
  31. # Bind the source and sink to the channel
  32. a1.sources.r1.channels = c1
  33. a1.sinks.k1.channel = c1

第三步:啟動Flume

啟動Flume並指定agent的名稱和配置文件路徑,-Dflume.root.logger=INFO,console選項可以將日誌實時輸出到控製台。

  1. $ cd { YOUR_APACHE_FLUME_DIR}
  2. $ bin/flume-ng agent -n a1 -c conf -f conf/odps_basic.conf -Dflume.root.logger=INFO,console

寫入成功,顯示日誌如下:

  1. ...
  2. Write success. Event count: 2
  3. ...

在ODPS Datahub表中即可查到數據;

五、了解更多

Apache Flume User Guide

ODPS Sink插件地址:aliyun-odps-flume-plugin

ODPS Sink配置參數說明

更多ODPS Sink配置示例

最後更新:2016-11-24 11:23:47

  上一篇:go DataX__數據入雲_數據集成-阿裏雲
  下一篇:go Fluentd MaxCompute插件__數據入雲_數據集成-阿裏雲