如何使用Python-pandas基于互联信息合并两个数据框?

2024-06-26 02:01:43 发布

您现在位置:Python中文网/ 问答频道 /正文

给定两个数据帧df1df2,其中包含item_id-rating和{}的信息:

df1:

B0006IYIMW 5.0
B000A56PUO 3.0
B000AMLQQU 4.0
B000OVNMGE 1.0

df2:

B0006IYIMW iphone
B000OVNMGE samsung
B000AMLQQU htc
B000A56PUO nokia

我希望合并df1df,以获得item_id-class-rating的完整信息,因此得到的数据帧应该是:

^{pr2}$

请注意,两个数据帧的顺序可能不同。在

你能告诉我怎么做吗?提前谢谢!在


Tags: 数据信息iditemdf1df2iphonerating
3条回答

像往常一样,当我找不到解决方案时,我就开始自己动手,等到我取得了很多糟糕的结果,最终找到了正确的解决方案时,其他人已经发布了one-line解决方案:)反正就是这样

import pandas as pd
# the frames are named the same way, and rows are in the same order
# assuming item-ids are unique I've created list of indices
# which corresponds to the index of the elements from df1 in df2
df2_index = [df2['item-id'].tolist().index(df1['item-id'][x]) for x in range(len(df1))]
# now reindex df1 according to the list and reset index!
df1 = df1.reindex(df2_index).reset_index(drop=True)
# now you can simply add the missing column
df2['item-rating'] = df1['item-rating']

设置

import pandas as pd

idx = pd.Index(['B0006IYIMW', 'B000A56PUO', 'B000AMLQQU', 'B000OVNMGE'],
               name='item-id')
df1 = pd.DataFrame([5., 3., 4., 1.],
                   columns=['rating'], index=idx)
df2 = pd.DataFrame(['iphone', 'samsung', 'htc', 'nokia'],
                   columns=['class'], index=idx)

解决方案

^{pr2}$

演示

print df 

              class  rating
item-id                    
B0006IYIMW   iphone     5.0
B000A56PUO  samsung     3.0
B000AMLQQU      htc     4.0
B000OVNMGE    nokia     1.0

试试这个:

import pandas as pd

df1 = pd.DataFrame([['B0006IYIMW',5.0],['B000A56PUO', 3.0],['B000AMLQQU', 4.0],['B000OVNMGE', 1.0]],columns=('item_id','rating'))
df2 = pd.DataFrame([['B0006IYIMW','iphone'],['B000A56PUO', 'nokia'],['B000AMLQQU', 'htc'],['B000OVNMGE', 'samsung']],columns=('item_id','class'))

df_merged = df1.merge(df2,on='item_id')

print df_merged

相关问题 更多 >