擅长:python、mysql、java
<p>您需要将<code>SequenceFileAsTextInputFormat</code>作为<code>inputformat</code>提供给hadoop流媒体jar。在</p>
<p>我从未使用过amazon aws mapreduce,但在正常的hadoop安装中,它会这样做:</p>
<pre><code>HADOOP=$HADOOP_HOME/bin/hadoop
$HADOOP jar $HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar \
-input <input_directory>
-output <output_directory> \
-mapper "mapper.py" \
-reducer "reducer.py" \
-inputformat SequenceFileAsTextInputFormat
</code></pre>