通过将其他列中的字符串连接到一个列中,根据特定列中的值合并数据帧的行

2024-09-30 18:20:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据帧:

 df1 = pd.DataFrame({
                   "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
                   "Case":[1,1,2,2,2,3,4],
                   "Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
                   "Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"],
                   "Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"],
                   "Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"],
                   "Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"],
                   "Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"]
                   })

我想根据Case列合并数据帧的行。因此,如果两行或更多行的Case列中的数字相同,则这些行的其他列的字符串将连接到一行中

同样,对于相同数量的情况,Business_Process_Activity列中的值也是相同的。对于该列,我不想连接Business_Process_Activity值,但只保留其中一个值,因为该列需要是分类的。我希望最终的数据帧如下所示:

df2 = pd.DataFrame({"Case":[1,2,3,4],
               "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
               "Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
               "Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"],
               "Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"],
               "Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"],
               "Receiver_email_count_Catg":["Few", "Double Double", "None", "None"],
               "Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"]
               })

如果字符串与“None”列合并,则应删除“None”字符串,因为该值不再为空。当行合并为一行时,应删除案例列的重复编号

我该怎么做?提前谢谢


Tags: reportnoneemailbusinessactivityprocessgmailcase
2条回答

想法是删除每个组的None值和None字符串,将它们连接在一起,最后将空字符串替换为None

df = (df1.groupby('Case')
         .agg(lambda x: ' '.join(x[x.ne('None') & x.notna()]))
         .where(lambda x: x.astype(bool), None)
         .reset_index())

另一个具有自定义功能的解决方案:

def f(x):
   y = x[x.ne('None') & x.notna()]
   return None if y.empty else ' '.join(y)

df = df1.groupby('Case').agg(f).reset_index()

使用:

g = df1.groupby('Case')
df2 = g.agg(lambda s: ' '.join(s[s.ne('None')] if s.ne('None').any() else ['None']))
df2['Business_Process_Activity'] = g['Business_Process_Activity'].first()
df2 = df2.reset_index()

# print(df2)



   Case  Business_Process_Activity  ... Receiver_email_count_Catg                                            Subject
0     1     SendingReportToManager  ...                       Few                                    Activity Report
1     2     SendingReportToManager  ...             Double Double  Project Progress Report Project Progress Report 2
2     3  PreparingAndSendingAgenda  ...                      None                                               None
3     4  PreparingAndSendingAgenda  ...                      None                                               None

相关问题 更多 >