<p>使用<a href="https://spark.apache.org/docs/latest/api/sql/#arrays_zip" rel="nofollow noreferrer">^{<cd1>}</a>函数+<code>explode</code>,然后最后使用group by和pivot:</p>
<pre><code>from pyspark.sql.functions import explode, expr, first
df1 = df.withColumn("name_score", explode(expr("arrays_zip(name, score)"))) \
.selectExpr("id", "name_score.*")\
.groupBy("id").pivot("name").agg(first("score"))
df1.show(truncate=False)
#+ + -+ -+ + + -+ +
#|id |F1 |F2 |F3 |F4 |F5 |F6 |
#+ + -+ -+ + + -+ +
#|ab01|00123|0.001 |0.127 |0.0123|111|null|
#|ab04|00345|0.01112|0.1567|0.0186|555|null|
#|ab03|00234|0.078 |0.188 |0.0144|188|null|
#|ab02|00124|0.003 |0.156 |0.067 |156|254 |
#+ + -+ -+ + + -+ +
</code></pre>
<p>对于<strong>Spark<;2.4</strong>:</p>
<p>您的UDF<code>combine</code>正常。出现错误<code>TypeError: zip argument #1 must support iteration</code>,因为要将列名作为字符串传递给UDF,请使用col传递列:</p>
<pre><code>df1 = df.withColumn("name_score", explode(combine(col("name"), col("score")))) \
.selectExpr("id", "name_score.*") \
.groupBy("id").pivot("name").agg(first("score"))
</code></pre>