如何在一定条件下按pandas数据帧分组问题的回答

如何在一定条件下按pandas数据帧分组

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

类似于安东的回答，但是使用apply <pre><code>users = df.groupby('buyer_id').apply(lambda r: r['item_id'].unique().shape[0] > 1 and r['date'].unique().shape[0] > 1 )*1 df.set_index('buyer_id', inplace=True) df['good_user'] = users </code></pre> 结果： ^{pr2}$ 编辑因为我想到了另一个案例：假设数据显示一个买家在两个不同的日子里购买了相同的两个（或更多）商品。应该将此用户标记为1还是0？因为实际上，他/她在第二次约会时并没有做出任何不同的选择。下表是81号买家。你看他们两次约会都只买49和50英镑。在 <pre><code> buyer_id item_id order_id date 139 57 387 2015-12-28 140 9 388 2015-12-28 140 57 389 2015-12-28 36 9 390 2015-12-28 64 49 404 2015-12-29 146 49 405 2015-12-29 81 49 406 2015-12-29 140 80 407 2015-12-30 139 81 408 2015-12-30 81 50 406 2015-12-29 81 49 999 2015-12-30 81 50 999 2015-12-30 </code></pre> 为了适应这种情况，我想出了一个办法（有点难看，但应该行得通） <pre><code># this function is applied to all buyers def find_good_buyers(buyer): # which dates the buyer has made a purchase buyer_dates = buyer.groupby('date') # a string representing the unique items purchased at each date items_on_date = buyer_dates.agg({'item_id': lambda x: '-'.join(x.unique())}) # if there is more than 1 combination of item_id, then it means that # the buyer has purchased different things in different dates # so this buyer must be flagged to 1 good_buyer = (len(items_on_date.groupby('item_id').groups) > 1) * 1 return good_buyer df['item_id'] = df['item_id'].astype('S') buyers = df.groupby('buyer_id') good_buyer = buyers.apply(find_good_buyers) df.set_index('buyer_id', inplace=True) df['good_buyer'] = good_buyer df.reset_index(inplace=True) </code></pre> 这适用于buyer 81，将其设置为0，因为一旦按日期分组，进行采购的两个日期将具有相同的“49-50”采购项目组合，因此组合数量=1，买方将被标记为0。在

如何在一定条件下按pandas数据帧分组

1 个回答

相关Python问题