当只查看某些列时,Pandas会在两个数据帧之间找到异常行

2024-05-20 15:45:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧-一个是由powerapp的用户编辑的。另一个直接来自onedrive。你知道吗

列标题几乎相同,我需要比较这两个数据帧,并将任何新行添加到来自powerapps的数据帧中。以下是两个示例数据帧:

Powerapps数据帧:

          Send/Collect            Hospital   Courier                      Kit                      Manufacturer  Status
0                Send     Nuffield Ipswich   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
1                 Send         BMI Rosshal   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  In Progress
2              Collect       Stepping Hill   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
3              Collect       York District  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
4  Royal Devon Exeter                  NaN       NaN  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Not Started
5              collect       Spire Bristol  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx  Complete
6                 Send         Bridlington  Courier        ToeMotion - MTP DF  Arthrosurface Hire Log 2018.xlsx  Not Started
7   Send Femoral Head    Hampshire Clinic        DHL             Human Tissue             Human Tissue Log.xlsx   Not Started

Onedrive数据帧:

          Send/Collect            Hospital   Courier                      Kit                      Manufacturer
0                Send     Nuffield Ipswich   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
1                 Send         BMI Rosshal   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
2              Collect       Stepping Hill   Courier  ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
3              Collect       York District  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
4  Royal Devon Exeter                                 ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
5              collect       Spire Bristol  Courier   ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
6  Royal Devon Exeter                                 ActivMotion (HTO - DFO)        NewClip Hire Log 2018.xlsx
7                 Send         Bridlington  Courier        ToeMotion - MTP DF  Arthrosurface Hire Log 2018.xlsx
8   Send Femoral Head    Hampshire Clinic        DHL             Human Tissue             Human Tissue Log.xlsx 

如您所见,powerapps数据框有一个不同的列(它可以包含不同的值,而不仅仅是“未启动”),而onedrive数据框有一个额外的行(需要进入powerapps df)。你知道吗

另请注意,虽然从onedrive数据帧中,空单元格是字符串“”,但从powerapps中,空单元格是nan。你知道吗

我需要将onedrive中的多余行合并到powerapps(将“未启动”状态添加到该行)。我想我需要一个方法,该方法将根据第0、3和4列中的相似性进行合并,同时忽略第1、2和5列。我该怎么做?你知道吗


Tags: 数据sendlognotonedrivexlsxcollectstarted
1条回答
网友
1楼 · 发布于 2024-05-20 15:45:44

我觉得康卡特适合这里

#replacing all the spaces with nan in the onedrive dataframe
onedrive.replace('""', 'nan') #use np.nan accordingly
powerapp = pd.concat([onedrive, powerapp])

powerapp.Status.fillna('Not Started', inplace=True)

根据列的子集删除冗余数据。
注意:合并后重新设置索引

相关问题 更多 >