我有一个大约550万行和13列的大型数据框架。我想根据两列的值筛选该表:“product\u id”和“return\u reason”
如果product_id和return_reason都有重复的值,我想选择整行
例如,假设我有以下df:
Index product_id return_reason col3 col4 col5
1 A0001 Size_too_big string31 string41 string51
2 A0001 Size_too_big string32 string42 string52
3 A0001 Defective string33 string43 string53
4 B0002 Defective string34 string44 string54
5 B0002 Defective string35 string45 string55
6 B0002 Changed_mind string36 string46 string56
7 C0003 Size_small string37 string47 string57
8 D0004 Size_small string38 string48 string58
9 D0004 Size_too_big string39 string49 string59
10 D0004 Size_small string310 string410 string510
我希望对其进行过滤,以便获得:
Index product_id return_reason col3 col4 col5
1 A0001 Size_too_big string31 string41 string51
2 A0001 Size_too_big string32 string42 string52
4 B0002 Defective string34 string44 string54
5 B0002 Defective string35 string45 string55
8 D0004 Size_small string38 string48 string58
10 D0004 Size_small string310 string410 string510
我已尝试使用groupby函数:
dup_df = df.groupby(['product_id', 'return_reason']).size().to_frame('count').reset_index()
ismorethanNcount = dup_df['count'] > 1
dup_morethanN = dup_df[ismorethanNcount]
print(dup_morethanN)
这导致:
Index product_id return_reason count
1 A0001 Size_too_big 2
4 B0002 Defective 2
8 D0004 Size_small 2
然后,我将新DataFrame中“product_id”和“return_reason”列的值放入列表中:
list_productid = list(dup_morethanN['product_id'])
list_returnreason = list(dup_morethanN['return_reason'])
然后,我使用上述列表作为筛选df的标准:
new_df = df[df['product_id'].isin(list_productid) & df['return_reason'].isin(list_returnreason)]
这使得:
Index product_id return_reason col3 col4 col5
1 A0001 Size_too_big string31 string41 string51
2 A0001 Size_too_big string32 string42 string52
3 A0001 Defective string33 string43 string53
4 B0002 Defective string34 string44 string54
5 B0002 Defective string35 string45 string55
8 D0004 Size_small string38 string48 string58
9 D0004 Size_too_big string39 string49 string59
10 D0004 Size_small string310 string410 string510
如您所见,此方法根据我给出的列表独立筛选“产品id”和“退货原因”。我应该怎么做
多谢各位
如果需要所有行都被2列复制,请在^{} 中使用^{} 和
keep=False
:或者,如果需要按^{} 按^{} 按^{} 按^{} 过滤的2行2列的行:
相关问题 更多 >
编程相关推荐