如何删除基于d的重复元素

2024-09-26 04:48:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一本词典,里面有名单

find_dup = {"one":[["1654","raj","425","16-02-2017"],["1654","mo","426","20-02-2017"],["1654","ss","425","20-02-2017"],["1654","vs","427","20-02-2017"],["1654","ss","425","14-02-2017"]]}

我想在第一个和第三个元素的列表中找到重复的

例如

^{pr2}$

从上面的元素可以看出1654425是重复的(因为我想根据第一个和第三个元素查找重复项)

所以从上面的列表来看,这个列表是重复的

["1654","raj","425","16-02-2017"] -> 1654,425
["1654","ss","425","20-02-2017"] -> 1654,425
["1654","ss","425","14-02-2017"] -> 1654,425

现在我们要从这个列表中删除2个日期较早的元素(列表的最后一个元素是date)

此2列表的日期较旧,因此应将其删除

["1654","raj","425","16-02-2017"] -> 1654,425
["1654","ss","425","14-02-2017"] -> 1654,425

结果应该是这样

find_dup = {"one":[["1654","mo","426","20-02-2017"],["1654","ss","425","20-02-2017"],["1654","vs","427","20-02-2017"]]}

我有一个python脚本,它迭代列表,但是我找不到逻辑,如果我发现重复并替换最新日期,如何弹出元素

这是我失败的脚本

find_dup = {"one":[["1654","raj","425","16-02-2017"],["1654","mo","426","20-02-2017"],["1654","ss","425","20-02-2017"],["1654","vs","427","20-02-2017"],["1654","ss","425","14-02-2017"]]}


for d in find_dup:
    len_d = len(find_dup[d])
    store_array_dup = []
    store_array_ele = {}
    for i in find_dup[d]:

        val = i[0]+"-"+i[1]"-"+i[2]"-"+i[3]
        val_1 = i[0]+"-"+i[2]
        if val_1 in store_array_dup:
            store_array_ele.append(val_1)
        else:
            arrs = []
            arrs.append(val)
            store_array_ele[d] = arrs

我怎么能得出这样的结果

find_dup = {"one":[["1654","mo","426","20-02-2017"],["1654","ss","425","20-02-2017"],["1654","vs","427","20-02-2017"]]}

Tags: storein元素列表valfindarrayone
3条回答

这是您的数据集:

find_dup = {"one":[
                      ["1654","raj","425","16-02-2017"],
                      ["1654","mo","426","20-02-2017"],
                      ["1654","ss","425","20-02-2017"],
                      ["1654","vs","427","20-02-2017"],
                      ["1654","ss","425","14-02-2017"]
                   ]
            }

您可以使用基于第一个和第三个元素的新键在数据集中创建新dict,并按日期排序:

^{pr2}$

输出:

>>> print(new_dict.values())
[['1654', 'vs', '427', '20-02-2017'], ['1654', 'mo', '426', '20-02-2017'], ['1654', 'ss', '425', '20-02-2017']]

首先解决列表列表的问题:

def mounarajan_no_dup(l):
    dedup = {}
    for i in l:
        k = (i[0], i[3])
        if k not in dedup:
            dedup[k] = i
        else :
            j3 = dedup[k][3]
            jdate = j3[6:10] + j3[3:5] + j3[0:2]
            i3 = i[3]
            idate = i3[6:10] + i3[3:5] + i3[0:2]
            if jdate < idate:
                dedup[k] = i
    return dedup.values()

然后将其应用于find_dup的每个条目。在

^{pr2}$

我建议按元组(第一个元素、第三个元素和日期)对列表进行排序,并首先保留最小的日期,然后按第一个和第三个元素对排序后的列表进行分组,最后从每个子组中选择第一个元素:

from itertools import groupby
from operator import itemgetter
from datetime import datetime

find_dup = {"one":[["1654","raj","425","16-02-2017"],["1654","mo","426","20-02-2017"],["1654","ss","425","20-02-2017"],["1654","vs","427","20-02-2017"],["1654","ss","425","14-02-2017"]]}

find_dup_sorted = sorted(find_dup["one"], key=lambda x: (x[0], x[2], datetime.strptime(x[3], "%d-%m-%Y")))

result = {"one": []}

for k, g in groupby(find_dup_sorted, key=itemgetter(0, 2)):
    result["one"].append(next(g))

print result

相关问题 更多 >