pyspark中未定义的函数UDF？

2条回答

网友

1楼 · 编辑于 2024-10-01 13:39:22

必须注册要与expr一起使用的函数：

spark.udf.register("incrementAC", incrementAC)

另外，从转换中使用的accumualtors也不可靠。在

网友

2楼 · 编辑于 2024-10-01 13:39:22

希望这有帮助！在

from pyspark.sql.functions import udf, expr, concat, col
from pyspark.sql.types import StringType

ac = sc.accumulator(0)

def incrementAC():
  ac.add(1)
  return str(ac)

#sample data
df = sc.parallelize([('Java',90),('Scala',95),('Spark',92)]).toDF(["language","rank"])

方法1:

^{pr2}$

方法2:

#another solution if you want to use 'expr' (as rightly pointed out by @user9132725)
sqlContext.udf.register("myudf", incrementAC, StringType())
df = df.withColumn("lang_and_rank", expr("concat(language, myudf())"))
df.show()

输出为：

+    +  +      -+
|language|rank|lang_and_rank|
+    +  +      -+
|    Java|  90|        Java1|
|   Scala|  95|       Scala1|
|   Spark|  92|       Spark2|
+    +  +      -+

相关问题更多 >

编程相关推荐

热门问题

热门文章

pyspark中未定义的函数UDF？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >