尝试通过Datafram在Pyspark中执行用户定义函数时出错

2024-09-27 21:25:27 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在Pyspark中创建一个小程序，在这个程序中，我想生成一个用过的定义函数，将lambda函数中的“method1”调用为“method0”。在

为了更好地理解de代码，我简化了de代码，但核心功能是：对于dataframe中的每个实例，“method0”应用“method1”（在lambda函数的帮助下）根据正在检查have的实例的值返回值。这样，如果满足“method1”的第一个条件，则该实例的值应为“-”，但如果不满足，则应为“other”。在

通过这些操作，我们的想法是从该UDF中获取一个列并将其附加到“method0”中的dataframe。以下是修改后的代码，便于您理解：

def method1(atr_list, instance, ident):

    if(instance.ATR1 != '-'):
        return instance.ATR1
    else:
        # Other operations ...
        return 'other'

def method0(df, atr_example_list, ident):

    udf_func = udf(lambda instance: method1(atr_example_list, instance, ident), returnType=StringType())
    new_column = udf_func(df)
    df = df.withColumnRenamed("New_Column", new_column)
    return df

result = method0(df, list, "1111")

但是当我执行这段代码时，我得到了下一个错误，我真的不知道为什么：

^{pr2}$

下面是输入和输出的示例：

数据帧'df'：

+-------+-------+-------+
| ATR1  |  ATR2 | ATRN  |
+-------+-------+-------+
| '-'   |   1   |  'a'  |
| '-'   |   1   |  'a'  |
| '-'   |   2   |  'b'  | 
| '++'  |   1   |  'a'  |
+-------+-------+-------+

将dataframe'df'作为参数传递给'method0'（对于这个简化的示例，不需要查看参数'atr\uexample'u list'和'ident'），我想在'method1'上得到这样一个列，调用：

+------------+
| new_column |
+------------+
|   'other'  |
|   'other'  |
|   'other'  |
|    '++'    |
+------------+

因此，在method0上，新的数据帧将是：

+-------+-------+-------+------------+
| ATR1  |  ATR2 | ATRN  | new_column |
+-------+-------+-------+------------+
| '-'   |   1   |  'a'  |   'other'  |
| '-'   |   1   |  'a'  |   'other'  |
| '-'   |   2   |  'b'  |   'other'  | 
| '++'  |   1   |  'a'  |    '++'    |
+-------+-------+-------+------------+

有人能帮我吗？在

Tags： instance lambda 函数代码 dataframe df new column

1条回答

网友

1楼 · 发布于 2024-09-27 21:25:27

你不能像那样简化和使用一个udf吗（如果需要，method1可以使用多个列）？公司名称：

def method1(x):
  if x != "-":
    return x
  else:
    return 'other'

u_method1 = udf(method1, StringType())

result = df.withColumn("new_column", u_method1("ATR1"))

尝试通过Datafram在Pyspark中执行用户定义函数时出错

相关问题更多 >

编程相关推荐

热门问题

热门文章

尝试通过Datafram在Pyspark中执行用户定义函数时出错

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >