python中的一对多连接，由于一对多连接而创建的重复记录中的填充为零

Class Section ID Subject Score I A 12 Maths 70 I A 12 Chemistry 85 I A 12 Physics 75 I A 16 Maths 70 I A 16 Chemistry 85 I A 16 Physics 75 I A 16 Arts 65 I B 14 Arts 60

Class Section ID Subject Score Total I A 12 Maths 70 230 I A 12 Chemistry 85 0 I A 12 Physics 75 0 I A 16 Maths 70 230 I A 16 Chemistry 85 65 I A 16 Physics 75 0 I A 16 Arts 65 0 I B 14 Arts 60 60

1条回答

网友

1楼 · 发布于 2024-06-26 01:37:38

非常晚的回答，但每个组都可以用groupby cumcount枚举，然后枚举可以用于merge：

cols = ['Class', 'Section', 'ID']
df3 = (
    df1.merge(df2.drop('Subject', axis=1)  # Remove unneeded column from df2
              .rename(columns={'Score': 'Total'}),  # Fix column name for output
              left_on=[*cols, df1.groupby(cols).cumcount()],
              right_on=[*cols, df2.groupby(cols).cumcount()],
              how='left')
        .drop('key_3', axis=1)  # remove added merge key
)

df3：

  Class Section  ID    Subject  Score  Total
0     I       A  12      Maths     70  230.0
1     I       A  12  Chemistry     85    NaN
2     I       A  12    Physics     75    NaN
3     I       A  16      Maths     70  230.0
4     I       A  16  Chemistry     85   65.0
5     I       A  16    Physics     75    NaN
6     I       A  16       Arts     65    NaN
7     I       B  14       Arts     60   65.0  # This should be 65 from df2

然后fillna和astype修复Total列：

df3['Total'] = df3['Total'].fillna(0).astype(int)

df3：

  Class Section  ID    Subject  Score  Total
0     I       A  12      Maths     70    230
1     I       A  12  Chemistry     85      0
2     I       A  12    Physics     75      0
3     I       A  16      Maths     70    230
4     I       A  16  Chemistry     85     65
5     I       A  16    Physics     75      0
6     I       A  16       Arts     65      0
7     I       B  14       Arts     60     65

数据帧构造函数：

import pandas as pd

df1 = pd.DataFrame({
    'Class': ['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I'],
    'Section': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B'],
    'ID': [12, 12, 12, 16, 16, 16, 16, 14],
    'Subject': ['Maths', 'Chemistry', 'Physics', 'Maths', 'Chemistry',
                'Physics', 'Arts', 'Arts'],
    'Score': [70, 85, 75, 70, 85, 75, 65, 60]
})

df2 = pd.DataFrame({
    'Class': ['I', 'I', 'I', 'I'],
    'Section': ['A', 'A', 'A', 'B'],
    'ID': [12, 16, 16, 14],
    'Subject': ['Total', 'Total', 'Total', 'Total'],
    'Score': [230, 230, 65, 65]
})

相关问题更多 >

编程相关推荐

热门问题

热门文章

python中的一对多连接，由于一对多连接而创建的重复记录中的填充为零

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >