<p>类似于安东的回答,但是使用apply</p>
<pre><code>users = df.groupby('buyer_id').apply(lambda r: r['item_id'].unique().shape[0] > 1 and
r['date'].unique().shape[0] > 1 )*1
df.set_index('buyer_id', inplace=True)
df['good_user'] = users
</code></pre>
<p>结果:</p>
^{pr2}$
<p><strong>编辑</strong>因为我想到了另一个案例:假设数据显示一个买家在两个不同的日子里购买了相同的两个(或更多)商品。应该将此用户标记为1还是0?因为实际上,他/她在第二次约会时并没有做出任何不同的选择。
下表是81号买家。你看他们两次约会都只买49和50英镑。在</p>
<pre><code> buyer_id item_id order_id date
139 57 387 2015-12-28
140 9 388 2015-12-28
140 57 389 2015-12-28
36 9 390 2015-12-28
64 49 404 2015-12-29
146 49 405 2015-12-29
81 49 406 2015-12-29
140 80 407 2015-12-30
139 81 408 2015-12-30
81 50 406 2015-12-29
81 49 999 2015-12-30
81 50 999 2015-12-30
</code></pre>
<p>为了适应这种情况,我想出了一个办法(有点难看,但应该行得通)</p>
<pre><code># this function is applied to all buyers
def find_good_buyers(buyer):
# which dates the buyer has made a purchase
buyer_dates = buyer.groupby('date')
# a string representing the unique items purchased at each date
items_on_date = buyer_dates.agg({'item_id': lambda x: '-'.join(x.unique())})
# if there is more than 1 combination of item_id, then it means that
# the buyer has purchased different things in different dates
# so this buyer must be flagged to 1
good_buyer = (len(items_on_date.groupby('item_id').groups) > 1) * 1
return good_buyer
df['item_id'] = df['item_id'].astype('S')
buyers = df.groupby('buyer_id')
good_buyer = buyers.apply(find_good_buyers)
df.set_index('buyer_id', inplace=True)
df['good_buyer'] = good_buyer
df.reset_index(inplace=True)
</code></pre>
<p>这适用于buyer 81,将其设置为0,因为一旦按日期分组,进行采购的两个日期将具有相同的“49-50”采购项目组合,因此组合数量=1,买方将被标记为0。在</p>