在嵌套dict中匹配某些值,但不匹配其他值

2024-09-29 23:15:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个嵌套字典:

{108: {Wallmart: {'ca': {'good': 'busy'}}},
 204: {Wallmart: {'ny': {'good': 'busy'}}},
 205: {Wallmart: {'ny': {'great': 'busy'}}},
 110: {CVS: {'ny': {'great': 'busy'}}},
 184: {Wallmart: {'fl': {'great': 'busy'}}},
 185: {Wallmart: {'fl': {'bad': 'busy'}}},
 105: {Wallmart: {'ga': {'bad': 'busy'}}},
 497: {Wallmart: {'ga': {'bad': 'busy'}}},
 400: {RiteAid: {'dc': {'good': 'busy'}}},
 406: {RidaAid: {'dc': {'geat': 'busy'}}},
 367: {Other: {'tx': {'bad': 'busy'}}}}

我需要做的是迭代这些数据,找到状态相同但店名不同的密钥。例如,根据上面提供的数据,输出结果应仅为:

 204: {Wallmart: {'ny': {'good': 'busy'}}},
 205: {Wallmart: {'ny': {'great': 'busy'}}},
 110: {CVS: {'ny': {'great': 'busy'}}},

因为州(纽约)匹配,但店名(沃尔玛、cvs)不匹配

然后,我会做一个user_input询问此代码的用户,以指定他们是想要“Wallmart”还是“CVS”。这一部分我没问题,但从这本词典中提取信息是很困难的


Tags: 数据字典dccacvsbadgoodga
3条回答

因此,您要做的是根据州对数据进行分组

result = {}
for iD, shops in data.items():
    for shop, states in shops.items():
        for state, _ in states.items():
            if not state in result:
                result[state] = [{iD: shops}]
            else:
                result[state].append({iD: shops})
print(result)

输出

{'ca': [{108: {'Wallmart': {'ca': {'good': 'busy'}}}}],
 'ny': [{204: {'Wallmart': {'ny': {'good': 'busy'}}}},
        {205: {'Wallmart': {'ny': {'great': 'busy'}}}},
        {110: {'CVS': {'ny': {'great': 'busy'}}}}],
 'fl': [{184: {'Wallmart': {'fl': {'great': 'busy'}}}},
        {185: {'Wallmart': {'fl': {'bad': 'busy'}}}}],
 'ga': [{105: {'Wallmart': {'ga': {'bad': 'busy'}}}},
        {497: {'Wallmart': {'ga': {'bad': 'busy'}}}}],
 'dc': [{400: {'RiteAid': {'dc': {'good': 'busy'}}}},
        {406: {'RidaAid': {'dc': {'geat': 'busy'}}}}],
 'tx': [{367: {'Other': {'tx': {'bad': 'busy'}}}}]}

您的dict结构是:

{id: {name: {state: {level: status}}}, ...}

使用以下结构会更容易:

{state1: {id: {store_dict}, ...}, state2: {...}, ...}

对于这种转换,我们需要通过获取每个级别中的唯一键并使用它访问下一个级别来获得状态。我们还将使用^{}为每个状态动态创建列表:

from collections import defaultdict

states =defaultdict(list)
for _id, store in data.items():
    name = list(store.keys())[0]
    state = list(store[name].keys())[0]
    states[state][_id] = store

现在,您只需访问带有所需状态的states。这是一个示例打印功能:

def print_state(state):
    for _id, store in states[state].items():
        print(_id, store, sep=': ')

并使用它:

>>> print_state('ny')
204: {'Wallmart': {'ny': {'good': 'busy'}}}
205: {'Wallmart': {'ny': {'great': 'busy'}}}
110: {'CVS': {'ny': {'great': 'busy'}}}

我试图通过以下方法部分解决这个问题DataFrame

其思想是使用recursion并将字典扁平化为res

然后使用5的滑动窗口连续提取5个元素-divide_chunks_sliding

d = {108: {'Wallmart': {'ca': {'good': 'busy'}}},
 204: {'Wallmart': {'ny': {'good': 'busy'}}},
 205: {'Wallmart': {'ny': {'great': 'busy'}}},
 110: {'CVS': {'ny': {'great': 'busy'}}},
 184: {'Wallmart': {'fl': {'great': 'busy'}}},
 185: {'Wallmart': {'fl': {'bad': 'busy'}}},
 105: {'Wallmart': {'ga': {'bad': 'busy'}}},
 497: {'Wallmart': {'ga': {'bad': 'busy'}}},
 400: {'RiteAid': {'dc': {'good': 'busy'}}},
 406: {'RidaAid': {'dc': {'geat': 'busy'}}},
 367: {'Other': {'tx': {'bad': 'busy'}}}}


def recur_dict(inp,res=[]):
    for x in inp:
         if isinstance(inp[x],dict):
             res += [x]
             recur_dict(inp[x],res)
         else:
             res += [x]
             res += [inp[x]]
    return res

def divide_chunks_sliding(in_arr,chunk):
    n = len(in_arr)
    i = 0
    while i < n:
        i += chunk
        yield in_arr[i-chunk:i]

##### Divide Chunks Usage Example #####
>>> print(list(divide_chunks_sliding([1,2,3,4,5,6,7,8],2)))

[[1, 2], [3, 4], [5, 6], [7, 8]]

利用上述功能,创建df并与store上的self合并

res = recur_dict(d)
values = list(divide_chunks_sliding(res,5))

df = pd.DataFrame(data=values,columns=['Key','Brand','store','review','flag'])

df_merge = pd.merge(df,df[['store','Brand']],on=['store'],suffixes=['_Self_Left','_Self_Right'])


>>> print(df_merge[df_merge['Brand_Self_Left'] != df_merge['Brand_Self_Right']])

    Key Brand_Self_Left store review  flag Brand_Self_Right
3   204        Wallmart    ny   good  busy              CVS
6   205        Wallmart    ny  great  busy              CVS
7   110             CVS    ny  great  busy         Wallmart
8   110             CVS    ny  great  busy         Wallmart
19  400         RiteAid    dc   good  busy          RidaAid
20  406         RidaAid    dc   geat  busy          RiteAid

df_merge将包含具有相同store和不同Brand的所有行,但是将其转换回原始结构仍处于挂起状态

相关问题 更多 >

    热门问题