如何避免在Python中出现双循环?

2024-09-29 01:25:57 发布

您现在位置:Python中文网/ 问答频道 /正文

https://jsonplaceholder.typicode.com/todos的数据中,我想按用户统计“已完成”的项目。你知道吗

目前,我的方法是首先收集现有的用户Id键,然后为数据集中的每个元素检查其是否属于当前用户,并附加到该用户的项列表中。你知道吗

users_items = {}

import json
from urllib import request

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)

def get_user_ids(items):
    for item in items:
        users_items[item['userId']] = None

def get_user_items():
    for uid in users_items:
        items = []
        for item in data:
            if(item['userId'] == uid):
                items.append(item['completed'])
        users_items[uid] = items

done_items_by_user = {}
def count_completed_by_user():
    for user in users_items:
        done_items_by_user[user] = sum(users_items[user])

get_user_ids(data)
get_user_items()

我尤其不喜欢双循环和在get_users_ids中用空列表初始化字典值。你知道吗


Tags: 用户inhttpsidsforuiddataget
3条回答

只需使用defaultdict对象:

import json
from urllib import request
from collections import defaultdict

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)


def count_user_completed_items(data):
    result = defaultdict(int)
    for item in data:
        if item['completed']: result[item['userId']] += 1
    return dict(result)


print(count_user_completed_items(data))

输出(其中key是“user ID”,value是一些“Done”项):

{1: 11, 2: 8, 3: 7, 4: 6, 5: 12, 6: 6, 7: 9, 8: 11, 9: 8, 10: 12}

流行的pandas库允许您在一行中执行此操作:

import pandas as pd
complete_items_per_user = pd.DataFrame(data).groupby('userId')['completed'].sum()

如果你在问没有pandas你能做些什么,你可以通过听写理解来避免显式循环:

users = set(x['userId'] for x in data)
complete_items_per_user = {user: sum(x['completed'] for x in data if x['userId']==user) for user in users}

可以使用dict方法get()插入/更新用户标识:

done_items_by_user = dict()
for item in data:
    done_items_by_user[item['userId']] = done_items_by_user.get(item['userId'], 0) + item['completed']

相关问题 更多 >