pyarrow.lib.arrow未实现错误

2024-06-28 20:08:09 发布

您现在位置:Python中文网/ 问答频道 /正文

pyarrow version is 0.17.1
pandas version is 1.0.4
pyspark 2.3.4

我有一个包含4列的pyspark数据框,如下所示:

  A      B       C        D
-----  -----   -----    -----
  1      2       3      My Name is cat
  2      4       5      I like to code

我需要根据列“D”向数据帧添加另一列“prediction”。该模型预测一个数据帧,输入为单列,输出为numpy数组。因此,我编写了一个UDFs来实现它

@pandas_udf(ArrayType(BooleanType()),PandasUDFType.SCALAR)
def t_func(pdf):
    predict=classifier.predict(pdf)[:,1]
    #Note: classifier.predict returns values as follows array([[0.706,0.293],[0.986,0.0713]])
    #Note: predict values would look like as follows array([0.293],[0.0713])
    #Note: type of predict is <class 'numpy.ndarray'>
    predictions = predict > decision_threshhold
    #Note: predictions values would look like as follows array([False,False])
    #Note: type of predictions is <class 'numpy.ndarray'>
    return pd.Series(predictions)

X = X.withColumn('prediction",t_func('D'))

最终输出应如下所示:

  A      B       C        D                 prediction
-----  -----   -----    -----               -------------
  1      2       3      My Name is cat     False
  2      4       5      I like to code      False

但是,我发现以下错误,不确定问题出在哪里:

pyarrow.lib.ArrowNotImplementedError: NumPyConverter doesn't implement <list<item: bool>> conversion


Tags: 数据numpyfalseisversionasarraypredict