<p>您可以将数据帧转换为rdd,通过检查与<code>MainDate</code>匹配的列日期,向每行追加一个新字段:</p>
<pre><code>df = spark.read.option("header", True).option("inferSchema", True).csv("test.csv")
from pyspark.sql import Row
from pyspark.sql.types import StringType
# get the list of columns you want to compare with MainDate
dates = [col for col in df.columns if col.startswith('Date')]
# for each row loop through the dates column and find the match, if nothing matches, return None
rdd = df.rdd.map(lambda row: row + Row(REAL = next((col for col in dates if row[col] == row['MainDate']), None)))
# recreate the data frame from the rdd
spark.createDataFrame(rdd, df.schema.add("REAL", StringType(), True)).show()
+ + + + + + -+
| MainDate| Date1| Date2| Date3| Date4| REAL|
+ + + + + + -+
|2015-10-25 00:00:...|2015-09-25 00:00:...|2015-10-25 00:00:...|2015-11-25 00:00:...|2015-12-25 00:00:...|Date2|
|2012-07-16 00:00:...|2012-04-16 00:00:...|2012-05-16 00:00:...|2012-06-16 00:00:...|2012-07-16 00:00:...|Date4|
|2005-03-14 00:00:...|2005-07-14 00:00:...|2005-08-14 00:00:...|2005-09-14 00:00:...|2005-10-14 00:00:...| null|
+ + + + + + -+
</code></pre>