如何将具有类似字符串值的行聚合为数据帧中的新行？

╔════════════════════════╦══════════╗ ║ Column A ║ Column B ║ ╠════════════════════════╬══════════╣ ║ / ║ 5.34 ║ ║ new-shirts ║ 6.78 ║ ║ new-pants ║ 10.11 ║ ║ used-hats ║ 1.56 ║ ║ used-shirts ║ 3.78 ║ ║ brand-new-watches/gold ║ 4.21 ║ ║ customer-service ║ 0.29 ║ ║ holiday-blowout-sale ║ 12.45 ║ ║ used-pants/corduroy ║ 2.98 ║ ║ special-discounts ║ 6.99 ║ ║ contact-us ║ 1.67 ║ ╚════════════════════════╩══════════╝

╔══════════╦══════════╗ ║ Column A ║ Column B ║ ╠══════════╬══════════╣ ║ Home ║ 5.34 ║ ║ New ║ 7.03 ║ ║ Used ║ 2.77 ║ ║ Service ║ 0.29 ║ ║ Other ║ 7.04 ║ ╚══════════╩══════════╝

╔═════════════════════════════════╦══════════╗ ║ Column A ║ Column B ║ ╠═════════════════════════════════╬══════════╣ ║ / ║ 5.34 ║ ║ /new-shirts/ ║ 6.78 ║ ║ /new-pants/ ║ 10.11 ║ ║ /used-hats/ ║ 1.56 ║ ║ /used-shirts/ ║ 3.78 ║ ║ /brand-new-watches/gold/ ║ 4.21 ║ ║ /customer-service/ ║ 0.29 ║ ║ /holiday-blowout-sale/december/ ║ 12.45 ║ ║ /used-pants/corduroy/ ║ 2.98 ║ ║ /special-discounts/ ║ 6.99 ║ ║ /contact-us/ ║ 1.67 ║ ╚═════════════════════════════════╩══════════╝

1条回答

网友

1楼 · 发布于 2024-10-01 04:55:12

我们可以定义要分类的单词，然后使用Series.str.extract从字符串中提取这些类别

然后我们使用GroupBy.sum获得每个类别的总和：

words = ['/', 'New', 'Used', 'Service']

cats = (
    df['Column A'].str.extract('((?i)'+'|'.join(words)+')')
                  .fillna('other')[0]
                  .str.capitalize()
                  .str.replace('/', 'Home')
)

df = df.groupby(cats, sort=False)['Column B'].mean().rename_axis('Column A', axis=0).reset_index()

  Column A  Column B
0     Home  5.340000
1      New  7.033333
2     Used  2.773333
3  Service  0.290000
4    Other  7.036667

相关问题更多 >

编程相关推荐

热门问题

热门文章