<p>当使用<a href="/questions/tagged/spark-1.6.2" class="post-tag" title="show questions tagged 'spark-1.6.2'" rel="tag">spark-1.6.2</a>和<a href="/questions/tagged/pyspark" class="post-tag" title="show questions tagged 'pyspark'" rel="tag">pyspark</a>时,我看到了:</p>
<p><a href="https://i.stack.imgur.com/m3yQm.png" rel="noreferrer"><img src="https://i.stack.imgur.com/m3yQm.png" alt="enter image description here"/></a></p>
<p>在这里您可以看到活动任务是一个负数(总任务数和已完成任务数之差)。在</p>
<p>这个错误的来源是什么?在</p>
<hr/>
<p>节点,我有很多执行器。然而,似乎有一个任务似乎已经闲置(我看不到任何进展),而另一个相同的任务正常完成。在</p>
<hr/>
<p>这也是相关的:我可以确认正在创建许多任务,因为我使用的是1k或2k执行器。在</p>
<p>我得到的错误有点不同:</p>
<pre><code>16/08/15 20:03:38 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
16/08/15 20:07:18 WARN TaskSetManager: Lost task 20652.0 in stage 4.0 (TID 116652, myfoo.com): FetchFailed(BlockManagerId(61, mybar.com, 7337), shuffleId=0, mapId=328, reduceId=20652, message=
org.apache.spark.shuffle.FetchFailedException: java.util.concurrent.TimeoutException: Timeout waiting for task.
</code></pre>