如何在pyspark中使用pandas UDF并在structyp中返回结果

2024-09-27 04:10:34 发布

男 | 程序猿一只，喜欢编程写python代码。

如何在pyspark中驱动基于panda udf的专栏。我写的udf如下：

from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("in_type string, in_var string, in_numer int", PandasUDFType.GROUPED_MAP)

def getSplitOP(in_data):
    if in_data is None or len(in_data) < 1:
        return None
    #Input/variable.12-2017
    splt=in_data.split("/",1)
    in_type=splt[0]

    splt_1=splt[1].split(".",1)
    in_var = splt_1[0]

    splt_2=splt_1[1].split("-",1)
    in_numer=int(splt_2[0])

    return (in_type, in_var, in_numer)
    #Expected output: ("input", "variable", 12)

df = df.withColumn("splt_col", getSplitOP(df.In_data))

有人能帮我确认一下，上面的代码有什么问题，为什么不起作用。在

Tags： in pandas df data string var type pyspark

0条回答

目前没有回答

如何在pyspark中使用pandas UDF并在structyp中返回结果

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在pyspark中使用pandas UDF并在structyp中返回结果

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >