用apach分组数据帧

2024-10-02 12:23:30 发布

男 | 程序猿一只，喜欢编程写python代码。

schema = StructType([
    StructField("title", StringType(), False),
    StructField("stringdataA", StringType(), False),
#     StructField("list", ArrayType( StructType([
#         StructField("A", IntegerType()  , False),
#         StructField("B", StringType()   , False),
#         StructField("C", TimestampType(), False)
#     ]))),
    StructField("stringdataB",  StringType(), False)])

    @pandas_udf(schema, PandasUDFType.GROUPED_MAP)
    def make_data(x):
        ~~ make data fitted in shcema

groupedList = df.groupby("groupkey").apply(make_data)

“make_data”函数将生成符合我定义的模式的数据，但当我在模式中添加list（map（））结构字段时。它给了我一个如下的错误。这真的不支持模式结构吗？在

有什么方法可以得到list（map（））结构数据，我可以处理吗？在

NotImplementedError: Invalid returnType with grouped map Pandas UDFs: StructType(List(StructField(title,StringType,false),StructField(stringdataA,StringType,false),StructField(list,ArrayType(StructType(List(StructField(A,IntegerType,false),StructField(B,StringType,false),StructField(C,TimestampType,false))),true),true),StructField(stringdataB,StringType,false))) is not supported

Tags： false map data make title schema 模式结构

1条回答

网友

1楼 · 发布于 2024-10-02 12:23:30

我认为您的列表元素是不受支持的StructType：

https://github.com/apache/spark/blob/4a4e7aeca79738d5788628d67d97d704f067e8d7/python/pyspark/sql/types.py#L1581

如果您想确认，请尝试调用pyspark.sql.types.to_arrow_schema(schema)，看看会发生什么。在

用apach分组数据帧

相关问题更多 >

编程相关推荐

热门问题

热门文章

用apach分组数据帧

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >