通过将其他列中的字符串连接到一个列中，根据特定列中的值合并数据帧的行

df1 = pd.DataFrame({ "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"], "Case":[1,1,2,2,2,3,4], "Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], "Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"], "Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"], "Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"], "Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"], "Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"] })

df2 = pd.DataFrame({"Case":[1,2,3,4], "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"], "Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], "Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"], "Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"], "Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"], "Receiver_email_count_Catg":["Few", "Double Double", "None", "None"], "Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"] })

2条回答

网友

1楼 · 编辑于 2024-09-30 18:20:05

想法是删除每个组的None值和None字符串，将它们连接在一起，最后将空字符串替换为None：

df = (df1.groupby('Case')
         .agg(lambda x: ' '.join(x[x.ne('None') & x.notna()]))
         .where(lambda x: x.astype(bool), None)
         .reset_index())

另一个具有自定义功能的解决方案：

def f(x):
   y = x[x.ne('None') & x.notna()]
   return None if y.empty else ' '.join(y)

df = df1.groupby('Case').agg(f).reset_index()

网友

2楼 · 编辑于 2024-09-30 18:20:05

使用：

g = df1.groupby('Case')
df2 = g.agg(lambda s: ' '.join(s[s.ne('None')] if s.ne('None').any() else ['None']))
df2['Business_Process_Activity'] = g['Business_Process_Activity'].first()
df2 = df2.reset_index()

# print(df2)



   Case  Business_Process_Activity  ... Receiver_email_count_Catg                                            Subject
0     1     SendingReportToManager  ...                       Few                                    Activity Report
1     2     SendingReportToManager  ...             Double Double  Project Progress Report Project Progress Report 2
2     3  PreparingAndSendingAgenda  ...                      None                                               None
3     4  PreparingAndSendingAgenda  ...                      None                                               None

相关问题更多 >

编程相关推荐

热门问题

热门文章