from pyspark.sql import functions as F
from pyspark.sql import types as T
df = spark.createDataFrame([(1,"This is a Horse"),(2,"Monkey Loves trees"),(3,"House has a tree"),(4,"The Ocean is Cold")],[ "col1","col2"])
df.show(truncate=False)
输出
+ + -+
|col1|col2 |
+ + -+
|1 |This is a Horse |
|2 |Monkey Loves trees|
|3 |House has a tree |
|4 |The Ocean is Cold|
+ + -+
这对您来说是一个可行的解决方案-使用高阶函数
array_contains()
而不是遍历每个项目,但是为了实现解决方案,我们需要稍微简化一下。例如需要将字符串列设置为数组在这里创建数据框
输出
逻辑在此-使用split()将字符串列转换为ArrayType
输出
相关问题 更多 >
编程相关推荐