多个具有重复键但值不同且列中没有限制的字典

2024-06-26 13:47:51 发布

您现在位置:Python中文网/ 问答频道 /正文

这里是字典中具有无限键的数据集。行中的“详细信息”列可能具有不同的信息产品,具体取决于客户

ID  Name    Detail
1   Sara    [{"Personal":{"ID":"001","Name":"Sara","Type":"01","TypeName":"Book"},"Order":[{"ID":"0001","Date":"20200222","ProductID":"C0123","ProductName":"ABC", "Price":"4"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0004","Date":"20200222","ProductID":"D0123","ProductName":"Small beef", "Price":"15"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0005","Date":"20200222","ProductID":"D0200","ProductName":"Shrimp", "Price":"28"}]}]
2   Frank   [{"Personal":{"ID":"002","Name":"Frank","Type":"02","TypeName":"Food"},"Order":[{"ID":"0008","Date":"20200228","ProductID":"D0288","ProductName":"Salmon", "Price":"24"}]}]

我的预期产出是

ID Name Personal_ID Personal_Name Personal_Type Personal_TypeName Personal_Order_ID Personal_Order_Date Personal_Order_ProductID Personal_Order_ProductName Personal_Order_Price    
1  Sara 001         Sara          01            Book              0001              20200222            C0123                    ABC                          4    
2  Sara 001         Sara          02            Food              0004              20200222            D0123                    Small beef                   15
3  Sara 001         Sara          02            Food              0005              20200222            D0200                    Shrimp                       28
4  Frank 002        Frank         02            Food              0008              20200228            D0288                    Salmon                       24

Tags: franknameiddatefoodtypeorderprice
3条回答

因此,基本上,在细节列中有一个嵌套的JSON,您需要将其分解为df,然后与原始JSON合并

import pandas as pd
import json
from pandas import json_normalize

#create empty df to hold the detail information
detailDf = pd.DataFrame()
#We will need to loop over each row to read each JSON
for ind, row in df.iterrows():
    #Read the json, make it a DF, then append the information to the empty DF
    detailDf = detailDf.append(json_normalize(json.loads(row['Detail']), record_path = ('Order'), meta = [['Personal','ID'], ['Personal','Name'], ['Personal','Type'],['Personal','TypeName']]))

# Personally, you don't really need the rest of the code, as the columns Personal.Name
# and Personal.ID is the same information, but none the less.

# You will have to merge on name and ID
df = df.merge(detailDf, how = 'right', left_on = [df['Name'], df['ID']], right_on = [detailDf['Personal.Name'], detailDf['Personal.ID'].astype(int)])

#Clean up
df.rename(columns = {'ID_x':'ID', 'ID_y':'Personal_Order_ID'}, inplace = True)
df.drop(columns = {'Detail', 'key_1', 'key_0'}, inplace = True)

如果您仔细阅读我的评论,我建议使用detailDf作为您的最终df,因为合并确实是不必要的,而且这些信息已经在详细的JSON中了

首先,您需要创建一个函数来处理每行Detail列中的dict列表。简单地说,pandas可以将dict列表作为数据帧处理。所以我在这里所做的就是处理每一行Personal和Detail列中的dict列表,以获得映射的数据帧,这些数据帧可以为每个条目合并。应用此功能时:

def processdicts(x):
    personal=pd.DataFrame.from_dict(list(pd.DataFrame.from_dict(x)['Personal']))
    personal=personal.rename(columns={"ID": "Personal_ID"})
    personal['Personal_Name']=personal['Name']
    orders=pd.DataFrame(list(pd.DataFrame.from_dict(list(pd.DataFrame.from_dict(x)['Order']))[0]))
    orders=orders.rename(columns={"ID": "Order_ID"})

    personDf=orders.merge(personal, left_index=True, right_index=True)
    return personDf

创建将包含编译数据的空数据帧

    outcome=pd.DataFrame(columns=[],index=[])

现在使用我们在上面创建的函数来处理数据帧中每一行的数据。在这里使用一个简单的for循环来显示流程。”也可以调用“应用”函数以提高效率,但只需稍微修改concat过程。由于手头有一个空的数据框,我们将在其中收集每行的数据,for循环非常简单,如下2行:

for details in yourdataframe['Detail']:
    outcome=pd.concat([outcome,processdicts(details)])

最后重置索引:

outcome=outcome.reset_index(drop=True)

您可以根据最终数据框中的要求重命名列。例如:

outcome=outcome.rename(columns={"TypeName": "Personal_TypeName","ProductName":"Personal_Order_ProductName","ProductID":"Personal_Order_ProductID","Price":"Personal_Order_Price","Date":"Personal_Order_Date","Order_ID":"Personal_Order_ID","Type":"Personal_Type"})

根据您的要求,使用以下命令(或跳过)列:

outcome=outcome[['Name','Personal_ID','Personal_Name','Personal_Type','Personal_TypeName','Personal_Order_ID','Personal_Order_Date','Personal_Order_ProductID','Personal_Order_ProductName','Personal_Order_Price']]

为数据帧的索引指定一个名称:

outcome.index.name='ID'

这应该会有所帮助

可以使用^{}分别获取Details中列表的所有元素,然后使用Shubham Sharma's answer

import io
import pandas as pd


#Creating dataframe:
s_e='''
ID    Name
1   Sara    
2   Frank    
'''

df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', engine='python')
df['Detail']=[[{"Personal":{"ID":"001","Name":"Sara","Type":"01","TypeName":"Book"},"Order":[{"ID":"0001","Date":"20200222","ProductID":"C0123","ProductName":"ABC", "Price":"4"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0004","Date":"20200222","ProductID":"D0123","ProductName":"Small beef", "Price":"15"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0005","Date":"20200222","ProductID":"D0200","ProductName":"Shrimp", "Price":"28"}]}],[{"Personal":{"ID":"002","Name":"Frank","Type":"02","TypeName":"Food"},"Order":[{"ID":"0008","Date":"20200228","ProductID":"D0288","ProductName":"Salmon", "Price":"24"}]}]]

#using explode
df = df.explode('Detail').reset_index()
df['Detail']=df['Detail'].apply(lambda x: [x])
print('using explode:', df)

#retrieved from @Shubham Sharma's answer:
personal = df['Detail'].str[0].str.get('Personal').apply(pd.Series).add_prefix('Personal_')

order = df['Detail'].str[0].str.get('Order').str[0].apply(pd.Series).add_prefix('Personal_Order_')

result = pd.concat([df[['ID', "Name"]], personal, order], axis=1)

#reset ID
result['ID']=[i+1 for i in range(len(result.index))]
print(result)

输出:

#Using explode:
    index  ID   Name                                                                                               Detail
0      0   1   Sara  [{'Personal': {'ID': '001', 'Name': 'Sara', 'Type': '01', 'TypeName': 'Book'}, 'Order': [{'ID': '0001', 'Date': '20200222', 'ProductID': 'C0123', 'ProductName': 'ABC', 'Price': '4'}]}]
1      0   1   Sara  [{'Personal': {'ID': '001', 'Name': 'Sara', 'Type': '02', 'TypeName': 'Food'}, 'Order': [{'ID': '0004', 'Date': '20200222', 'ProductID': 'D0123', 'ProductName': 'Small beef', 'Price': '15'}]}]
2      0   1   Sara  [{'Personal': {'ID': '001', 'Name': 'Sara', 'Type': '02', 'TypeName': 'Food'}, 'Order': [{'ID': '0005', 'Date': '20200222', 'ProductID': 'D0200', 'ProductName': 'Shrimp', 'Price': '28'}]}]
3      1   2  Frank  [{'Personal': {'ID': '002', 'Name': 'Frank', 'Type': '02', 'TypeName': 'Food'}, 'Order': [{'ID': '0008', 'Date': '20200228', 'ProductID': 'D0288', 'ProductName': 'Salmon', 'Price': '24'}]}]




#result:
   ID Name Personal_ID Personal_Name Personal_Type Personal_TypeName Personal_Order_ID Personal_Order_Date Personal_Order_ProductID Personal_Order_ProductName Personal_Order_Price    
0   1  Sara 001         Sara          01            Book              0001              20200222            C0123                    ABC                          4    
1   2  Sara 001         Sara          02            Food              0004              20200222            D0123                    Small beef                   15
2   3  Sara 001         Sara          02            Food              0005              20200222            D0200                    Shrimp                       28
3   4  Frank 002        Frank         02            Food              0008              20200228            D0288                    Salmon                       24

相关问题 更多 >