将pyspark dataframe转换为字典：结果与预期不同

data = [("USA",20,40,60), ("India",50,40,30), ("Nepal",20,50,30), ("Ireland",40,60,70), ("Norway",50,50,60) ] columns = ["country", "A", "B", "C"] df = spark.createDataFrame(data=data,schema=columns)

{'USA': {'country': 'USA', 'A': 20, 'B': 40, 'C': 60}, 'India': {'country': 'India', 'A': 50, 'B': 40, 'C': 30}, 'Nepal': {'country': 'Nepal', 'A': 20, 'B': 50, 'C': 30}, 'Ireland': {'country': 'Ireland', 'A': 40, 'B': 60, 'C': 70}, 'Norway': {'country': 'Norway', 'A': 50, 'B': 50, 'C': 60}}

2条回答

网友

1楼 · 编辑于 2024-06-01 19:07:54

您可以进行dict理解以删除不需要的项目：

list_test = [row.asDict() for row in df.collect()]
dict_test = {country['country']: {k:v for k,v in country.items() if k != 'country'} for country in list_test}

print(dict_test)
{'USA': {'A': 20, 'B': 40, 'C': 60}, 'India': {'A': 50, 'B': 40, 'C': 30}, 'Nepal': {'A': 20, 'B': 50, 'C': 30}, 'Ireland': {'A': 40, 'B': 60, 'C': 70}, 'Norway': {'A': 50, 'B': 50, 'C': 60}}

网友

2楼 · 编辑于 2024-06-01 19:07:54

另一种方法是在一些转换之后直接从数据帧收集json字符串，然后使用json.loads获取dict对象：

import json
    
from pyspark.sql.functions import to_json, collect_list, struct, map_from_arrays

dict_test = json.loads(
    df.groupBy().agg(
        collect_list("country").alias("countries"),
        collect_list(struct("A", "B", "C")).alias("values")
    ).select(
        to_json(map_from_arrays("countries", "values")).alias("json_str")
    ).collect()[0].json_str
)

print(dict_test)

#{'USA': {'A': 20, 'B': 40, 'C': 60}, 'India': {'A': 50, 'B': 40, 'C': 30}, 'Nepal': {'A': 20, 'B': 50, 'C': 30}, 'Ireland': {'A': 40, 'B': 60, 'C': 70}, 'Norway': {'A': 50, 'B': 50, 'C': 60}}

相关问题更多 >

编程相关推荐

热门问题

热门文章