Python筛选器字典(按特定筛选器元组)

2024-05-19 06:23:26 发布

您现在位置:Python中文网/ 问答频道 /正文

EDITED input and output based on comments to make the question more clear.

我有一个具有唯一键的字典,但其中一些表示同一数据集的不同层(除了最后两个字符外,它们具有相同的名称)

编辑:并非每个数据集都是在每一层生成的,因此可能有一些数据集仅在“T1”层可用,而其他数据集可用于多个层(编辑结束)

我还有一个包含层级别的元组。现在我想过滤字典,使其只包含“最佳”可用层。层是键的一部分,但也可以从每个字典条目的值中获取。这里是一个MWE:

my_dict = {
    'LC08_L1TP_200029_20210716_20210721_02_T1': {  # best tier for this dataset --> keep it
        'cc': 30.57,
        'tier': 'T1',
    },
    'LC08_L1TP_200029_20210716_20210721_02_RT': {  # worst tier for this dataset --> remove it
        'cc': 30.57,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210630_20210708_02_T2': {  # worst tier for this dataset --> remove it
        'cc': 60.52,
        'tier': 'T2',
    },
    'LC08_L1TP_200029_20210630_20210708_02_RT': {  # best tier for this dataset --> keep it
        'cc': 60.52,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210614_20210628_02_T2': {  # only tier for this datset --> keep it
        'cc': 15.61,
        'tier': 'T2',
    },
}
tiers = ('T1', 'RT', 'T2')  # this is the tier order

最后,我想要一个新字典,它看起来像这样,只包含基于tiers的“最佳”可用层:

{
    'LC08_L1TP_200029_20210716_20210721_02_T1': {
        'cc': 30.57,
        'tier': 'T1',
    },
    'LC08_L1TP_200029_20210630_20210708_02_RT': {
        'cc': 60.52,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210614_20210628_02_T2': {
        'cc': 15.61,
        'tier': 'T2',
    },
}

我知道排序的key=lambda x功能,如How do I sort a list of dictionaries by a value of the dictionary?中所述,但仅仅排序并不是我的目标

我也想到了类似的东西,但它显然不起作用,因为我需要它:

for key in my_dict.keys():
    for tier in tiers:
        if key.endswith(tier):
            new_dict[key] = my_dict[key]
            break

Tags: 数据keyfor字典itthisdatasetdict
3条回答

据我所知,“最佳”是指具有最大cc值的

  • 您需要首先根据cc键对字典进行排序()以简化筛选
  • 迭代tiers元组和排序字典,并将匹配的tiers项存储到字典-new_dict
  • 我使用了visited{}来避免再次访问tiers

编辑

You don't need to use a set. Just a break would do. Based on @Xitiz comment.

代码如下:

my_dict = {
    'LC08_L1TP_200029_20210716_20210721_02_T1': {
        'cc': 30.57,
        'tier': 'T1',
    },
    'LC08_L1TP_200029_20210716_20210721_02_RT': {
        'cc': 30.57,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210630_20210708_02_T2': {
        'cc': 60.52,
        'tier': 'T2',
    },
    'LC08_L1TP_200029_20210630_20210708_02_RT': {
        'cc': 60.52,
        'tier': 'RT',
    }
}
tiers = ('T1', 'RT', 'T2')  # this is the tier order

# Sorting the dict based on 'cc' in descending order
my_dict = dict(sorted(my_dict.items(), key=lambda x: -x[1]['cc']))
new_dict = {}

for i in tiers:
    for k,v in my_dict.items():
        if v['tier'] == i:
            new_dict.update({k: v})
            break
            
print(new_dict)

输出:

{
{
 'LC08_L1TP_200029_20210716_20210721_02_T1': {
    'cc': 30.57, 
    'tier': 'T1'
}, 
 'LC08_L1TP_200029_20210630_20210708_02_RT': {
    'cc': 60.52, 
    'tier': 'RT'
}, 
 'LC08_L1TP_200029_20210630_20210708_02_T2': {
    'cc': 60.52, 
    'tier': 'T2'
}
}

您可以使用itertools.groupby执行此任务


tiers = {'T1':1, 'RT':2, 'T2':3 }  # this is the tier order

import itertools

data = {}
by_tier = sorted( my_dict.items(), key= lambda kv: kv[1]['tier'] )
for tier,group in itertools.groupby( by_tier , key= lambda kv: kv[1]['tier']):
  max_item = max( group, key=lambda kv: kv[1]['cc'])
  data[tier] = { max_item[0] : max_item[1] }
{'RT': {'LC08_L1TP_200029_20210630_20210708_02_RT': {'cc': 60.52,
                                                     'tier': 'RT'}},
 'T1': {'LC08_L1TP_200029_20210716_20210721_02_T1': {'cc': 30.57,
                                                     'tier': 'T1'}},
 'T2': {'LC08_L1TP_200029_20210630_20210708_02_T2': {'cc': 60.52,
                                                     'tier': 'T2'}}}

问题的第一个版本

tiers = {'T1':1, 'RT':2, 'T2':3 }  # this is the tier order

import itertools

by_tier = sorted( my_dict.items(), key= lambda kv: tiers[kv[1]['tier']] )
for tier,group in itertools.groupby( by_tier , key= lambda kv: kv[1]['tier']):
  print("for tier {0}".format(tier))
  for item in group:
    print("  ==> {0}".format(item))
for tier T1
  ==> ('LC08_L1TP_200029_20210716_20210721_02_T1', {'cc': 30.57, 'tier': 'T1'})
for tier RT
  ==> ('LC08_L1TP_200029_20210716_20210721_02_RT', {'cc': 30.57, 'tier': 'RT'})
  ==> ('LC08_L1TP_200029_20210630_20210708_02_RT', {'cc': 60.52, 'tier': 'RT'})
for tier T2
  ==> ('LC08_L1TP_200029_20210630_20210708_02_T2', {'cc': 60.52, 'tier': 'T2'})

现在,您可以轻松地生成所需的格式

您可以按以下方式分解此问题:

  • 获取数据集的唯一名称数据集:
    • 从字典中提取关键字 k = list(my_dict.keys())
    • 删除层ds = map(lamba x: x[:-2], k)
    • 创建仅包含唯一名称的列表ds = list(set(ds))

然后浏览字典,找出字典中实际存在的键(数据集名称+层),找到可用的最佳数据集。如果按层的正确顺序执行此操作,将得到正确的结果

highest_tiers = []
for d in ds:
 for t in tiers[::-1]:
    k_t = k+t
    if k_t in list(my_dict.keys()):
       highest_tiers.append(k_t)
       break

相关问题 更多 >

    热门问题