閱讀943 返回首頁    go 阿裏雲 go 技術社區[雲棲]


Hadoop Streaming__Hadoop_開發人員指南_E-MapReduce-阿裏雲

python 寫hadoop streaming作業

mapper代碼如下

  1. #!/usr/bin/env python
  2. import sys
  3. for line in sys.stdin:
  4. line = line.strip()
  5. words = line.split()
  6. for word in words:
  7. print '%st%s' % (word, 1)

reducer代碼如下

  1. #!/usr/bin/env python
  2. from operator import itemgetter
  3. import sys
  4. current_word = None
  5. current_count = 0
  6. word = None
  7. for line in sys.stdin:
  8. line = line.strip()
  9. word, count = line.split('t', 1)
  10. try:
  11. count = int(count)
  12. except ValueError:
  13. continue
  14. if current_word == word:
  15. current_count += count
  16. else:
  17. if current_word:
  18. print '%st%s' % (current_word, current_count)
  19. current_count = count
  20. current_word = word
  21. if current_word == word:
  22. print '%st%s' % (current_word, current_count)

假設mapper代碼保存在/home/hadoop/mapper.py, reducer代碼保存在/home/hadoop/reducer.py , 輸入路徑為hdfs文件係統的/tmp/input,輸出路徑為hdfs文件係統的/tmp/output。則在E-MapReduce集群上提交下麵的hadoop命令

hadoop jar /usr/lib/hadoop-current/share/hadoop/tools/lib/hadoop-streaming-*.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input /tmp/hosts -output /tmp/output

最後更新:2016-11-23 16:04:18

  上一篇:go Pig 開發手冊__Hadoop_開發人員指南_E-MapReduce-阿裏雲
  下一篇:go HBase 開發手冊__開發人員指南_E-MapReduce-阿裏雲