在python中使用itertools按键创建新的列表分组很困难

2024-09-27 07:27:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下列词典清单

dataset={"users": [
    {"id": 20, "loc": "Chicago", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Manufacturing"}, {"sname": null}]}, 
    {"id": 21, "loc": "Frankfurt", "st":"4", "sectors": [{"sname": null}]}, 
    {"id": 22, "loc": "Berlin", "st":"6", "sectors": [{"sname": "Manufacturing"}, {"sname": "Banking"},{"sname": "Agri"}]}, 
    {"id": 23, "loc": "Chicago", "st":"2", "sectors": [{"sname": "Banking"}, {"sname": "Agri"}]},
    {"id": 24, "loc": "Bern", "st":"1", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}]},
    {"id": 25, "loc": "Bern", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}, {"sname": "Banking"}]}
    ]}

我试着用下面的代码从上面的列表中删除loc,扇区 这样我的列表就只包含id和loc了

fs_loc = []
for g, items in itertools.groupby(data['users'], lambda x: (x['id'],x['loc'])):
    fs_loc.append({ 'id': g[0], 'loc': g[1] })
print(fs_loc)

从这里,我如何才能创建新的列表,这样它将有一个id的列表和他们的计数,如下面的位置分组。你知道吗

{"locations": [
    {"loc": "Chicago","count":2,"ids": [{"id": "20"}, {"id": "23"}]}, 
    {"loc": "Bern","count":2,"ids": [{"id": "24"}, {"id": "25"}]}, 
    {"loc": "Frankfurt","count":1,"ids": [{"id": "21"}]}, 
    {"loc": "Berlin","count":1,"ids": [{"id": "21"}]}    
    ]}

我发现使用itertools制作上述列表有困难,可能我缺少一些更好的方法来实现上述目标,请您提出建议。你知道吗


Tags: idids列表countfsuserslocst
1条回答
网友
1楼 · 发布于 2024-09-27 07:27:17

您需要将排序后的序列传递给itertools.groupby。你知道吗

根据^{} documentation

... Generally, the iterable needs to already be sorted on the same key function.

The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.

byloc = lambda x: x['loc']

it = (
    (loc, list(user_grp))
    for loc, user_grp in itertools.groupby(
        sorted(dataset['users'], key=byloc), key=byloc
    )
)
fs_loc = [
    {'loc': loc, 'ids': [x['id'] for x in grp], 'count': len(grp)}
    for loc, grp in it
]

fs_loc

[
    {'count': 1, 'loc': 'Berlin', 'ids': [22]},
    {'count': 2, 'loc': 'Bern', 'ids': [24, 25]},
    {'count': 2, 'loc': 'Chicago', 'ids': [20, 23]},
    {'count': 1, 'loc': 'Frankfurt', 'ids': [21]}
]

相关问题 更多 >

    热门问题