过滤用户过去行为的更有效方法

common_muids = list(set(useractivity_ids).intersection(reco_ids)) final_rec1 = reco[reco.masteruserid.isin(common_muids)] final_rec2 = reco[~(reco.masteruserid.isin(common_muids))] d=DataFrame() for i in common_muids: final_rec_reduced=final_rec1[final_rec1.id==i] useractivity_reduced=useractivity[useractivity.id==i] useractivity_reduced_tbids=useractivity_reduced.tbid.unique().tolist() final_rec_reduced=final_rec_reduced[~( final_rec_reduced.tbid.isin(useractivity_reduced_tbids))] d=d.append(final_rec_reduced)

2条回答

网友

1楼 · 编辑于 2024-09-28 21:23:12

假设有两个数据帧

recommendation_df = pd.DataFrame({'content': {0: 100, 1: 101, 2: 102, 3: 103, 4: 103, 5: 105},
 'id': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 4}})

以及

past_data = pd.DataFrame({'content': {0: 34, 1: 23, 2: 102, 3: 103, 4: 103, 5: 100},
 'id': {0: 1, 1: 5, 2: 2, 3: 2, 4: 3, 5: 6},
 'random': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5}})

可以在两个数据帧之间进行左连接

df = pd.merge(recommendation_df , past_data_df , how = 'left')

然后只获取具有null值的行，这些值存在于推荐数据帧中，而不存在于用户活动数据帧中

df.loc[df.random.isnull()]

网友

2楼 · 编辑于 2024-09-28 21:23:12

您可以在useractivity_ids中添加一个伪变量，然后使用pandas merge进行比较和筛选。你知道吗

In [35]: useractivity_ids['tracker'] = 1

In [39]: reco_ids = reco_ids.merge(useractivity_ids, how='left')

In [40]: reco_ids[reco_ids['tracker'].isnull()].drop('tracker', axis=1)
Out[40]: 
   id  content
0   1      100
1   1      101
5   4      105

在pandas的下一个版本（0.17）中，merge有一个indicator关键字，可以在不使用伪变量的情况下执行此操作。你知道吗

In [47]: (pd.merge(reco_ids, useractivity_ids, how='left', indicator=True)
            .query('_merge == "left_only"'))
Out[47]: 
   id  content     _merge
0   1      100  left_only
1   1      101  left_only
5   4      105  left_only

相关问题更多 >

编程相关推荐

热门问题

热门文章