python中的一对多连接,由于一对多连接而创建的重复记录中的填充为零

2024-06-26 01:37:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个熊猫数据帧df1,&;df2.这种关系是一对多&;我需要0,而不是重复具有1关系的表的相同值;合并后的数据流 df1看起来像

Class  Section  ID   Subject    Score
 I       A       12  Maths       70
 I       A       12  Chemistry   85 
 I       A       12  Physics     75
 I       A       16  Maths       70
 I       A       16  Chemistry   85 
 I       A       16  Physics     75
 I       A       16  Arts        65
 I       B       14  Arts        60   

&;df2看起来像

Class  Section  ID   Subject    Score
 I       A       12  Total       230
 I       A       16  Total       230
 I       A       16  Total       65 
 I       B       14  Total       65

我想使用匹配的列Class、Section、ID&;我需要加入后最后一张桌子的样子

   Class  Section  ID   Subject  Score Total
 I       A       12  Maths       70    230 
 I       A       12  Chemistry   85    0
 I       A       12  Physics     75    0
 I       A       16  Maths       70    230
 I       A       16  Chemistry   85    65  
 I       A       16  Physics     75    0
 I       A       16  Arts        65    0
 I       B       14  Arts        60    60

您能建议我如何使用Python3.X实现这一点吗


Tags: 数据id关系sectionclasstotalampsubject
1条回答
网友
1楼 · 发布于 2024-06-26 01:37:38

非常晚的回答,但每个组都可以用groupby cumcount枚举,然后枚举可以用于merge

cols = ['Class', 'Section', 'ID']
df3 = (
    df1.merge(df2.drop('Subject', axis=1)  # Remove unneeded column from df2
              .rename(columns={'Score': 'Total'}),  # Fix column name for output
              left_on=[*cols, df1.groupby(cols).cumcount()],
              right_on=[*cols, df2.groupby(cols).cumcount()],
              how='left')
        .drop('key_3', axis=1)  # remove added merge key
)

df3

  Class Section  ID    Subject  Score  Total
0     I       A  12      Maths     70  230.0
1     I       A  12  Chemistry     85    NaN
2     I       A  12    Physics     75    NaN
3     I       A  16      Maths     70  230.0
4     I       A  16  Chemistry     85   65.0
5     I       A  16    Physics     75    NaN
6     I       A  16       Arts     65    NaN
7     I       B  14       Arts     60   65.0  # This should be 65 from df2

然后fillnaastype修复Total列:

df3['Total'] = df3['Total'].fillna(0).astype(int)

df3

  Class Section  ID    Subject  Score  Total
0     I       A  12      Maths     70    230
1     I       A  12  Chemistry     85      0
2     I       A  12    Physics     75      0
3     I       A  16      Maths     70    230
4     I       A  16  Chemistry     85     65
5     I       A  16    Physics     75      0
6     I       A  16       Arts     65      0
7     I       B  14       Arts     60     65

数据帧构造函数:

import pandas as pd

df1 = pd.DataFrame({
    'Class': ['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I'],
    'Section': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B'],
    'ID': [12, 12, 12, 16, 16, 16, 16, 14],
    'Subject': ['Maths', 'Chemistry', 'Physics', 'Maths', 'Chemistry',
                'Physics', 'Arts', 'Arts'],
    'Score': [70, 85, 75, 70, 85, 75, 65, 60]
})

df2 = pd.DataFrame({
    'Class': ['I', 'I', 'I', 'I'],
    'Section': ['A', 'A', 'A', 'B'],
    'ID': [12, 16, 16, 14],
    'Subject': ['Total', 'Total', 'Total', 'Total'],
    'Score': [230, 230, 65, 65]
})

相关问题 更多 >