擅长:python、mysql、java
<p>试试这个:</p>
<pre class="lang-py prettyprint-override"><code>from pyspark.sql import functions as F, Window as W
df.withColumn(
"id2",
F.first("id2").over(
W.partitionBy("grp")
.orderBy("row")
.rowsBetween(W.unboundedPreceding, W.currentRow)
),
).show()
+ -+ + -+ -+
|id1| id2|row|grp|
+ -+ + -+ -+
| 12|1234| 1| 1|
| 23|1234| 2| 1|
| 65|2345| 1| 2|
| 45|2345| 2| 2|
| 45|2345| 3| 2|
+ -+ + -+ -+
</code></pre>