擅长:python、mysql、java
<p>一个简单的交换和连接就可以做到这一点。首先,让我们创建一些虚拟数据和一个小的helper函数:</p>
<pre><code>actor_movie = sc.parallelize([
("actor 1", "movie 1"),
("actor 1", "movie 3"),
("actor 1", "movie 3"),
("actor n", "movie 2")
])
swap = lambda x: (x[1], x[0])
</code></pre>
<p>接下来交换订单:</p>
^{pr2}$
<p>加入:</p>
<pre><code>(movie_actor
.join(movie_actor) # Join by movie
.values() # Extract values (actors)
.filter(lambda x: x[0] != x[1]))
</code></pre>