在Hadoop流式处理中使用elephantbird输入格式时出错问题的回答

在Hadoop流式处理中使用elephantbird输入格式时出错

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我尝试在我的Hadoop流脚本中使用来自<a href="https://github.com/kevinweil/elephant-bird" rel="nofollow">Elephant Bird</a>的输入格式。特别是，我想使用LzoInputFormat，最终使用LzoJsonInputFormat（在这里处理Twitter数据）。但是当我尝试这样做时，我总是得到一个错误，它表明大象鸟格式不是InputFormat类的有效实例。在 这是我如何运行流式处理命令： <pre><code>hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u5.jar \ -libjars /project/hanna/src/elephant-bird/build/elephant-bird-2.2.0.jar \ -D stream.map.output.field.separator=\t \ -D stream.num.map.output.key.fields=2 \ -D map.output.key.field.separator=\t \ -D mapred.text.key.partitioner.options=-k1,2 \ -file /home/a/ahanna/sandbox/hadoop-textual-analysis/streaming/filter/filterMap.py \ -file /home/a/ahanna/sandbox/hadoop-textual-analysis/streaming/filter/filterReduce.py \ -file /home/a/ahanna/sandbox/hadoop-textual-analysis/streaming/data/latinKeywords.txt \ -inputformat com.twitter.elephantbird.mapreduce.input.LzoTextInputFormat \ -input /user/ahanna/lzotest \ -output /user/ahanna/output \ -mapper filterMap.py \ -reducer filterReduce.py \ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner </code></pre> 我得到的错误是： ^{pr2}$

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在Hadoop流式处理中使用elephantbird输入格式时出错

1 个回答

相关Python问题