将分层数据帧转换为嵌套的词典列表

[{name: "state1", children:[{name: "county1", children:[{name: "city1", population: "13000" }, {name: "city2", population: "10000" }, {name: "county2", children:[{name: "city1", population: "1000" }, {name: "city2", population: "100000" }] }] },{name: "state2", children:[{name: "county1", children:[{name: "city1", population: "13000" }, {name: "city2", population: "10000" }, {name: "county2", children:[{name: "city1", population: "1000" }, {name: "city2", population: "100000" }] }] }]

import pandas as pd from benedict import benedict # read in the data df = pd.read_csv("C:\\Users\\m316375\\Downloads\\uscities.csv") # Using Benedict to create a nested list df_benedict = df[["state_name","city", "county_name", "population"]] node_id = ["state_name", "county_name","city"] df_benedict['dict_path'] = df[node_id].astype(str).apply('_'.join, axis=1) d = benedict() d.keypath_separator = '_' for row in df_benedict.iterrows(): dict_path = row[1]["dict_path"] d[dict_path] = row[1]["population"] ##### First Attempt ######## #looping through the nested dictionary state_children = [] city_children = [] county_children = [] full_children = [] dict_list = [] counter = 0 for state, v0 in d.items(): #print(f"state={state}, population={v0})") for city, v1 in v0.items(): for county, v2 in v1.items(): county_children.append({"name": city, "value": v2}) counter += 1 # print(counter) if counter > len(v1.items()): city_children.append({"name": county, "children": county_children}) county_children = [] counter = 0 state_children = [{"name": city, "children": city_children}] dict_list.append({"name": state, "children": state_children})

1条回答

网友

1楼 · 发布于 2024-05-10 11:23:01

我想我得到了你需要的东西，虽然有点笨重。如果您提供的链接中的数据加载到dataframedf，则代码如下：

首先，groupby将州、市和县移动到多索引，并将人口作为唯一列：

df_gr = df.groupby(['state_name', 'county_name', 'city']).sum()['population']

然后，我们可以使用字典理解构建所需的字典：

resulting_dict = {level0: {level1: {level2: df_gr.xs([level0, level1, level2]) for level2 in df_gr.xs([level0, level1]).reset_index().groupby(['city']).sum().index} for level1 in df_gr.xs([level0]).reset_index().groupby(['county_name', 'city']).sum().index.levels[0]} for level0 in df_gr.index.levels[0]}

基本上，我们使用.xs()返回数据帧在所需级别的横截面。我们还确保不会循环使用不存在的级别组合。用.reset_index()后跟.groupby()来获取横截面的索引，而不是整个数据帧（因为在.xs()之后使用.index.levels返回整个数据帧的级别，我不知道有什么更简单的方法使其仅返回横截面的索引）

您可以根据所需的输出格式定制词典理解

相关问题更多 >

编程相关推荐

热门问题

热门文章