检测JSON列表中的重复项并将其删除

2024-09-30 06:34:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一份清单,上面有替换,有时还有德语和英语的副本。我想从列表中删除重复项。因此,我想说:如果某个警报(我检测为重复,且“开始”和“结束”的时间戳相同)在列表中是重复的,则从警报列表中删除整个数据集列表(这意味着“描述”、“事件”、“开始”…):在这种情况下,应删除第二个列表:

{
"alerts": [
    {
        "description": "Es tritt leichter Frost auf.",
        "end": 1613379600,
        "event": "FROST",
        "lang": "de",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613322000
    },
    {
        "description": "There is a risk of frost",
        "end": 1613379600,
        "event": "frost",
        "lang": "en",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613322000
    },
    {
        "description": "There is a risk of wind gusts",
        "end": 1613408400,
        "event": "wind gusts",
        "lang": "en",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613336400
    }}

如何在python中执行此操作并保存新的警报列表而不重复? 我想它一定是这样的(对不起,伪代码,我不能转移已经给出的例子,我是初学者…)请帮助!太多了

for item in data['alerts']:
    if item['start'] == item['start'] and item['end'] == item['end']
        delete

所以我得到这个输出:

 {
"alerts": [
    {
        "description": "Es tritt leichter Frost auf.",
        "end": 1613379600,
        "event": "FROST",
        "lang": "de",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613322000
    },
    {
        "description": "There is a risk of wind gusts",
        "end": 1613408400,
        "event": "wind gusts",
        "lang": "en",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613336400
    }}

Tags: nameevent列表langdescriptionitemstartsender
3条回答

按lang按相反顺序对输入列表排序-en将出现在de之前,然后制作一个dict,其中键是tuple(start, end)并使用dict.values()。因为de将在en之后出现。如果存在具有相同密钥start、end的警报,de将更新该密钥的值

data = {
"alerts": [
    {
        "description": "Es tritt leichter Frost auf.",
        "end": 1613379600,
        "event": "FROST",
        "lang": "de",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613322000
    },
    {
        "description": "There is a risk of wind gusts",
        "end": 1613408400,
        "event": "wind gusts",
        "lang": "en",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613336400
    }]}

unique = {(item['start'], item['end']):item for item in
           sorted(data['alerts'], key=lambda x: x['lang'], reverse=True)}
data['alerts'] = sorted(unique.values(), key=lambda x: (x['start'], x['end']))

输出

{
    "alerts": [
        {
            "description": "Es tritt leichter Frost auf.",
            "end": 1613379600,
            "event": "FROST",
            "lang": "de",
            "sender_name": "DWD / Nationales Warnzentrum Offenbach",
            "start": 1613322000
        },
        {
            "description": "There is a risk of wind gusts",
            "end": 1613408400,
            "event": "wind gusts",
            "lang": "en",
            "sender_name": "DWD / Nationales Warnzentrum Offenbach",
            "start": 1613336400
        }
    ]
}

不确定是否需要按时间排序的结果,以便删除该部分

您可以通过字典理解进行过滤:

 data = {
"alerts": [
    {
        "description": "Es tritt leichter Frost auf.",
        "end": 1613379600,
        "event": "FROST",
        "lang": "de",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613322000
    },
    {
        "description": "There is a risk of frost",
        "end": 1613379600,
        "event": "frost",
        "lang": "en",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613322000
    },
    {
        "description": "There is a risk of wind gusts",
        "end": 1613408400,
        "event": "wind gusts",
        "lang": "en",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1613336400
    }]}

filtered = {(entry["start"], entry["end"]): entry for entry in reversed(data["alerts"])}

data["alerts"] = list(filtered.values())

这种方法利用了重复的字典键被最后一个条目覆盖的事实。 如果要保留最后一个重复条目而不是第一个条目,请删除reversed()

您可以使用^{} [Python-docs]对所有类似的时间戳进行分组,然后选择英语文档

from itertools import groupby

data["alerts"] = sorted(data["alerts"], key=lambda x: (x["end"], x["start"]))
data["alerts"] = [
    g
    for key, group in groupby(data["alerts"], key=lambda x: (x["end"], x["start"]))
    for g in group
    if g["lang"] == "en"  # change accordingly
]

相关问题 更多 >

    热门问题