我有一个数据集,我需要过滤的“独特”发生。基本上,我想删除同一用户一天多次购买同一产品的每一行,而不考虑设备的变化。在多次出现的情况下,我希望只保留第一行。你知道吗
数据:
datetime, device, product, user
[
['2013-07-08 15:00:00', 'pc', 'X', 'A'],
['2013-07-09 17:00:00', 'pc', 'X', 'A'],
['2013-07-09 10:00:00', 'andr', 'Y', 'B'],
['2013-07-10 18:00:00', 'pc', 'Y', 'B'],
['2013-07-10 21:00:00', 'ipho', 'Y', 'B'], <- second occurance of B getting Y that day
['2013-07-10 22:00:00', 'andr', 'Y', 'B'], <- third occurance of B getting Y that day
['2013-07-10 02:00:00', 'ipho', 'Z', 'C'],
['2013-07-10 11:00:00', 'pc', 'Z', 'C'] <- second occurance of C getting Z that day
]
应过滤为:
['2013-07-08 15:00:00', 'pc', 'X', 'A'],
['2013-07-09 17:00:00', 'pc', 'X', 'A'],
['2013-07-09 10:00:00', 'andr', 'Y', 'B'],
['2013-07-10 18:00:00', 'pc', 'Y', 'B'],
['2013-07-10 02:00:00', 'ipho', 'Z', 'C'],
['2013-07-10 11:00:00', 'pc', 'Z', 'C']
我该怎么做呢?你知道吗
从datetime中去掉时间部分,然后将每个项存储在字典中(如果它还没有)。使用日期、产品、用户的元组作为字典的键。你知道吗
例如
相关问题 更多 >
编程相关推荐