将df1列中值的两个数据帧合并为df2列中逗号分隔的值,并将df1.Column2=df2.Column2

2024-10-03 17:17:37 发布

您现在位置:Python中文网/ 问答频道 /正文

输入:具有以下值的两个数据帧:

df1:

| Employee Name | EmployeeID | workDate   |
|---------------|------------|------------|
| John          | 2,22       | 2020-11-01 |
| John          | 2,22       | 2020-11-02 |
| Kim           | 3          | 2020-11-01 |

df2:

| EmployeeID | workDate   | Hours |
|------------|------------|-------|
| 2          | 2020-11-01 | 8     |
| 22         | 2020-11-02 | 2     |
| 3          | 2020-11-01 | 10    |

需要在df1.EmployeeIDs和df2.workDate==df1.workDate中连接df2.EmployeeID上的这两个数据帧

输出:

| Employee Name | EmployeeID | workDate   | Hours |
|---------------|------------|------------|-------|
| John          | 2,22       | 2020-11-01 | 8     |
| John          | 2,22       | 2020-11-02 | 2     |
| Kim           | 3          | 2020-11-01 | 10    |

Tags: 数据nameemployeejohndf1df2hourskim
1条回答
网友
1楼 · 发布于 2024-10-03 17:17:37

使用^{},分割的值,然后使用^{}和左连接,最后通过^{}聚合回来:

#converted to strings for match splitted values
df2['EmployeeID'] = df2['EmployeeID'].astype(str)

    
df1 = (df1.assign(EmployeeID = df1['EmployeeID'].str.split('\s*,\s*'))
          .explode('EmployeeID')
          .merge(df2, on=['EmployeeID','workDate'], how='left')
          .groupby(['Employee Name','workDate'], as_index=False, sort=False)
          .agg({'EmployeeID':','.join, 'Hours':'sum'}))
print (df1)
  Employee Name    workDate EmployeeID  Hours
0          John  2020-11-01       2,22    8.0
1          John  2020-11-02       2,22    2.0
2           Kim  2020-11-01          3   10.0
  

相关问题 更多 >