我的数据可以下载here,看起来是这样的
我的目标是建立一个网络,其中节点是州、市和县,按人口大小排列。这将是应用程序的一部分,因此节点级别的选择将是动态的,可以是州、市和县的任意组合Here是我想要实现的可视化。 数据需要如下所示:
[{name: "state1",
children:[{name: "county1",
children:[{name: "city1",
population: "13000"
},
{name: "city2",
population: "10000"
},
{name: "county2",
children:[{name: "city1",
population: "1000"
},
{name: "city2",
population: "100000"
}]
}]
},{name: "state2",
children:[{name: "county1",
children:[{name: "city1",
population: "13000"
},
{name: "city2",
population: "10000"
},
{name: "county2",
children:[{name: "city1",
population: "1000"
},
{name: "city2",
population: "100000"
}]
}]
}]
这就是我到目前为止所尝试的
import pandas as pd
from benedict import benedict
# read in the data
df = pd.read_csv("C:\\Users\\m316375\\Downloads\\uscities.csv")
# Using Benedict to create a nested list
df_benedict = df[["state_name","city", "county_name", "population"]]
node_id = ["state_name", "county_name","city"]
df_benedict['dict_path'] = df[node_id].astype(str).apply('_'.join, axis=1)
d = benedict()
d.keypath_separator = '_'
for row in df_benedict.iterrows():
dict_path = row[1]["dict_path"]
d[dict_path] = row[1]["population"]
##### First Attempt ########
#looping through the nested dictionary
state_children = []
city_children = []
county_children = []
full_children = []
dict_list = []
counter = 0
for state, v0 in d.items():
#print(f"state={state}, population={v0})")
for city, v1 in v0.items():
for county, v2 in v1.items():
county_children.append({"name": city,
"value": v2})
counter += 1
# print(counter)
if counter > len(v1.items()):
city_children.append({"name": county,
"children": county_children})
county_children = []
counter = 0
state_children = [{"name": city,
"children": city_children}]
dict_list.append({"name": state,
"children": state_children})
问题:我的方法不是动态的。如果我只想显示州和城市,我需要删除其中一个for循环。不理想
我想我得到了你需要的东西,虽然有点笨重。如果您提供的链接中的数据加载到dataframe
df
,则代码如下:首先,groupby将州、市和县移动到多索引,并将人口作为唯一列:
然后,我们可以使用字典理解构建所需的字典:
基本上,我们使用
.xs()
返回数据帧在所需级别的横截面。我们还确保不会循环使用不存在的级别组合。用.reset_index()
后跟.groupby()
来获取横截面的索引,而不是整个数据帧(因为在.xs()
之后使用.index.levels
返回整个数据帧的级别,我不知道有什么更简单的方法使其仅返回横截面的索引)您可以根据所需的输出格式定制词典理解
相关问题 更多 >
编程相关推荐