在Python中删除列表中的重复元素

2024-07-04 05:51:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个清单:

[{'score': '92', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
 {'score': '61', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'}]

我想根据它们的imageId删除重复的图像。因此,在上述示例中,imageID 6184de26-e11d-4a7e-9c44-a1af8012d8d0出现了2次(保留得分最高的一次)。你知道吗

在Python中如何做到这一点?你知道吗


Tags: 图像示例labelscoreslidingdoorimageida1af8012d8d0
3条回答

我假设你想保留分数最高的条目。试试这个:

my_list = [
    {'score': '92', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
    {'score': '61', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'}
]

by_id = {}
for element in my_list:
   imageId = element['imageId']
   if imageId in by_id:
       if int(by_id[imageId]['score']) < int(element['score']):
           # Replace because of higher score
           by_id[imageId] = element
   else:
       # Insert new element
       by_id[imageId] = element

print(list(by_id.values()))

我建议您稍微加强示例,以便:

  • 它测试数值比较
  • 它有不连续的“重复”元素

我将创建一个标记dict,id作为键,sublist作为值。循环输入,如果值更高,则覆盖dict条目(不要忘记强制转换为整数)

my_list = [
    {'score': '192', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
    {'score': '61', 'imageId': 'fffffe26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'misc'},
    {'score': '761', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'},
    {'score': '45', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
]

import collections

d = dict()

for subdict in my_list:
    score = int(subdict['score'])
    image_id = subdict['imageId']
    if image_id not in d or int(d[image_id]['score']) < score:
        d[image_id] = subdict

new_list = list(d.values())

结果(使用词典时顺序可能会改变):

[{'imageId': 'fffffe26-e11d-4a7e-9c44-a1af8012d8d0',
  'label': 'misc',
  'score': '61'},
 {'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0',
  'label': 'Sliding Door',
  'score': '761'}]

如果你有大量的数据,就用1.数据帧它的清洁阅读和维护。你知道吗

import pandas as pd

my_list = [
    {'score': '192', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
    {'score': '61', 'imageId': 'fffffe26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'misc'},
    {'score': '761', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'},
    {'score': '45', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
]

# create dataframe
df = pd.DataFrame(my_list)

# your score is string! convert it to int
df['score'] = df['score'].astype('int')

# sort values
df = df.sort_values(by=['imageId', 'score'], ascending=False)

# drop duplicates
df = df.drop_duplicates('imageId', keep='first')


    imageId                                 label           score
1   fffffe26-e11d-4a7e-9c44-a1af8012d8d0    misc            61
2   6184de26-e11d-4a7e-9c44-a1af8012d8d0    Sliding Door    761

相关问题 更多 >

    热门问题