当我利用Trenton McKinney提供的this helpful answer处理多个嵌套JSON文件以在pandas中处理时,我提出了一个问题。
按照他的建议,我使用^{
单个JSON文件大致如下所示:
{
"product": "example_productname",
"product_id": "example_productid",
"product_type": "example_producttype",
"producer": "example_producer",
"currency": "example_currency",
"client_id": "example_clientid",
"supplement": [
{
"supplementtype": "RTZ",
"price": 300000,
"rebate": "500",
},
{
"supplementtype": "CVB",
"price": 500000,
"rebate": "250",
},
{
"supplementtype": "JKL",
"price": 100000,
"rebate": "750",
},
],
}
利用引用的代码,我将得到如下数据:
这有多个问题
首先,在我的数据中,有一个有限的“补充”列表,但是,它们并不总是出现,如果出现,它们也不总是以相同的顺序出现在示例表中,您可以看到第二行中的两个“补充”切换了位置。我更喜欢“补充栏目”的固定顺序
其次,最好的选择是这样的表格:
我已经尝试过编辑引用的flatten_json
函数,但我不知道如何使其工作。
解决方案包括简单地编辑字典(感谢Andrej Kesely)。我刚刚添加了一个异常传递,以防某些列不存在
d = {
"product": "example_productname",
"product_id": "example_productid",
"product_type": "example_producttype",
"producer": "example_producer",
"currency": "example_currency",
"client_id": "example_clientid",
"supplement": [
{
"supplementtype": "RTZ",
"price": 300000,
"rebate": "500",
},
{
"supplementtype": "CVB",
"price": 500000,
"rebate": "250",
},
{
"supplementtype": "JKL",
"price": 100000,
"rebate": "750",
},
],
}
for s in d["supplement"]:
try:
d["supplementtype_{}_price".format(s["supplementtype"])] = s["price"]
except:
pass
try:
d["supplementtype_{}_rebate".format(s["supplementtype"])] = s["rebate"]
except:
pass
del d["supplement"]
df = pd.DataFrame([d])
print(df)
product product_id product_type producer currency client_id supplementtype_RTZ_price supplementtype_RTZ_rebate supplementtype_CVB_price supplementtype_CVB_rebate supplementtype_JKL_price supplementtype_JKL_rebate
0 example_productname example_productid example_producttype example_producer example_currency example_clientid 300000 500 500000 250 100000 750
使用/引用的代码:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
"""
Flatten a list of nested dicts.
"""
out = dict()
def flatten(x: (list, dict, str), name: str='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude:
flatten(x[a], f'{name}{a}{sep}')
elif type(x) is list:
i = 0
for a in x:
flatten(a, f'{name}{i}{sep}')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
# list of files
files = ['test1.json', 'test2.json']
# list to add dataframe from each file
df_list = list()
# iterate through files
for file in files:
with open(file, 'r') as f:
# read with json
data = json.loads(f.read())
# flatten_json into a dataframe and add to the dataframe list
df_list.append(pd.DataFrame.from_dict(flatten_json(data), orient='index').T)
# concat all dataframes together
df = pd.concat(df_list).reset_index(drop=True)
您可以在创建数据帧之前修改字典:
印刷品:
相关问题 更多 >
编程相关推荐