擅长:python、mysql、java
<p>在直接方法中,你不应该从一个主题创建多个数据流。在</p>
<p>从<a href="http://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html" rel="nofollow noreferrer">documentation</a>:</p>
<blockquote>
<p>Simplified Parallelism: No need to create multiple input Kafka streams
and union them. With directStream, Spark Streaming will create as many
RDD partitions as there are Kafka partitions to consume, which will
all read data from Kafka in parallel. So there is a one-to-one mapping
between Kafka and RDD partitions, which is easier to understand and
tune.</p>
</blockquote>
<p>所以只需创建一个DStream,Spark将使用所有Kafka分区:)</p>