<p>下面是一个操作列表,<em>可能会导致洗牌:</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@cogroup[W1,W2,W3](other1:org.apache.spark.rdd.RDD[(K,W1)],other2:org.apache.spark.rdd.RDD[(K,W2)],other3:org.apache.spark.rdd.RDD[(K,W3)],partitioner:org.apache.spark.Partitioner):org.apache.spark.rdd.RDD[(K,(Iterable[V],Iterable[W1],Iterable[W2],Iterable[W3]))]" rel="noreferrer">^{<cd1>}</a></p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@groupWith[W1,W2,W3](other1:org.apache.spark.rdd.RDD[(K,W1)],other2:org.apache.spark.rdd.RDD[(K,W2)],other3:org.apache.spark.rdd.RDD[(K,W3)]):org.apache.spark.rdd.RDD[(K,(Iterable[V],Iterable[W1],Iterable[W2],Iterable[W3]))]" rel="noreferrer">^{<cd2>}</a></p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@join[W](other:org.apache.spark.rdd.RDD[(K,W)],numPartitions:Int):org.apache.spark.rdd.RDD[(K,(V,W))]" rel="noreferrer">^{<cd3>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@leftOuterJoin[W](other:org.apache.spark.rdd.RDD[(K,W)],numPartitions:Int):org.apache.spark.rdd.RDD[(K,(V,Option[W]))]" rel="noreferrer">^{<cd4>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@rightOuterJoin[W](other:org.apache.spark.rdd.RDD[(K,W)],numPartitions:Int):org.apache.spark.rdd.RDD[(K,(Option[V],W))]" rel="noreferrer">^{<cd5>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@groupByKey():org.apache.spark.rdd.RDD[(K,Iterable[V])]" rel="noreferrer">^{<cd6>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@reduceByKey(func:(V,V)=%3EV):org.apache.spark.rdd.RDD[(K,V)]" rel="noreferrer">^{<cd7>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@combineByKey[C](createCombiner:V=%3EC,mergeValue:(C,V)=%3EC,mergeCombiners:(C,C)=%3EC):org.apache.spark.rdd.RDD[(K,C)]" rel="noreferrer">^{<cd8>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/OrderedRDDFunctions.html#sortByKey(boolean,%20int)" rel="noreferrer">^{<cd9>}</a>:范围分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@distinct():org.apache.spark.rdd.RDD[T]" rel="noreferrer">^{<cd10>}</a></p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@intersection(other:org.apache.spark.rdd.RDD[T],numPartitions:Int):org.apache.spark.rdd.RDD[T]" rel="noreferrer">^{<cd11>}</a>:哈希分区</p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@repartition(numPartitions:Int)(implicitord:Ordering[T]):org.apache.spark.rdd.RDD[T]" rel="noreferrer">^{<cd12>}</a></p>
<p><a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@coalesce(numPartitions:Int,shuffle:Boolean,partitionCoalescer:Option[org.apache.spark.rdd.PartitionCoalescer])(implicitord:Ordering[T]):org.apache.spark.rdd.RDD[T]" rel="noreferrer">^{<cd13>}</a></p>
<p>来源:<a href="https://www.coursera.org/learn/scala-spark-big-data/lecture/LQT67/optimizing-with-partitioners" rel="noreferrer">Big Data Analysis with Spark and Scala</a>,用分区优化,Coursera</p>