擅长:python、mysql、java
<p>我会使用<code>coalesce</code>:</p>
<pre><code>from pyspark.sql.functions import col, when, coalesce, lit
df = spark.createDataFrame([
("2015-10-25", "2015-09-25", "2015-10-25", "2015-11-25", "2015-12-25"),
("2012-07-16", "2012-04-16", "2012-05-16", "2012-06-16", "2012-07-16"),
("2005-03-14", "2005-07-14", "2005-08-14", "2005-09-14", "2005-10-14"),],
("MainDate", "Date1", "Date2", "Date3", "Date4")
)
df.withColumn("REAL",
coalesce(*[when(col(c) == col("MainDate"), lit(c)) for c in df.columns[1:]])
).show()
+ + + + + + -+
| MainDate| Date1| Date2| Date3| Date4| REAL|
+ + + + + + -+
|2015-10-25|2015-09-25|2015-10-25|2015-11-25|2015-12-25|Date2|
|2012-07-16|2012-04-16|2012-05-16|2012-06-16|2012-07-16|Date4|
|2005-03-14|2005-07-14|2005-08-14|2005-09-14|2005-10-14| null|
+ + + + + + -+
</code></pre>
<p>在哪里</p>
^{pr2}$
<p>如果有匹配项,则返回列名(<code>lit(c)</code>),否则返回<code>NULL</code>。在</p>
<p>这应该比<code>udf</code>或转换成{<cd5>}快得多。在</p>