擅长:python、mysql、java
<p>sklearn射频模型可以相当大时,腌制。在任务调度过程中,模型的频繁pickle/unpickle可能会导致该问题。你可以考虑使用广播变量。在</p>
<p>从<a href="http://spark.apache.org/docs/latest/rdd-programming-guide.html#broadcast-variables" rel="nofollow noreferrer">official document</a>:</p>
<blockquote>
<p>Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input dataset in an efficient manner. Spark also attempts to distribute broadcast variables using efficient broadcast algorithms to reduce communication cost.</p>
</blockquote>