擅长:python、mysql、java
<p>你试过用<a href="https://spark.apache.org/docs/1.1.1/api/python/pyspark.rdd.RDD-class.html#top" rel="nofollow">^{<cd1>}</a>吗?考虑到您想要最高的平均值(而且它是元组中的第三个项),您需要使用<code>lambda</code>函数将它分配给键。</p>
<pre><code># items = (number_of_ratings, title, avg_rating)
newRDD = sc.parallelize([(3, 'monster', 4), (4, 'minions 3D', 5)])
top_n = 10
>>> newRDD.top(top_n, key=lambda items: items[2])
[(4, 'minions 3D', 5), (3, 'monster', 4)]
</code></pre>