假设我有下面的df
df = spark.createDataFrame([
("a", "apple"),
("a", "pear"),
("b", "pear"),
("c", "carrot"),
("c", "apple"),
], ["id", "fruit"])
+---+-------+
| id| fruit|
+---+-------+
| a| apple|
| a| pear|
| b| pear|
| c| carrot|
| c| apple|
+---+-------+
现在,我想为每个id创建一个布尔标志,该标志为TRUE
,该id在果列fruit
中至少有一列"pear"
所需的输出如下所示:
+---+-------+------+
| id| fruit| flag|
+---+-------+------+
| a| apple| True|
| a| pear| True|
| b| pear| True|
| c| carrot| False|
| c| apple| False|
+---+-------+------+
对于熊猫,我找到了一个带有groupby().transform()
{a1}的解决方案,但我不知道如何将其转换为PySpark
使用
max
窗口函数:如果需要检查多个水果,可以使用
in
运算符。例如,要检查carrot
和apple
:如果您喜欢python语法:
相关问题 更多 >
编程相关推荐