在字典列表中有效地计数

2024-09-30 14:30:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有大约10000本字典的清单。每个项目都有(一个或多个)标签和(0或1)类型。例如:

mydata = [{'item': 'item1', 'label': ['history','politics'], 'type': 'paper'},
     {'item': 'item2', 'label': ['sport','politics'], 'type': 'magazine'},
     {'item': 'item3', 'label': ['science','politics'], 'type': 'paper'},
     {'item': 'item4', 'label': ['science'], 'type': 'book'},
     {'item': 'item5', 'label': ['science','fun']}
     ]

我想数一数有多少个项目有一个特定的标签,其中有多少是每种类型

对于上面的mydata对象,我的输出应该如下所示:

{'fun': {'magazine': 0, 'paper': 0, 'book': 0}, 
'science': {'magazine': 0, 'paper': 1, 'book': 1}, 
'sport': {'magazine': 1, 'paper': 0, 'book': 0}, 
'history': {'magazine': 0, 'paper': 1, 'book': 0}, 
'politics': {'magazine': 1, 'paper': 2, 'book': 0}}

下面的代码可以工作,但它很难看,而且可能效率低下。有什么改进的建议吗?我读到collections.Counter()是相关的,我学会了如何使用labels,但我无法让它在labels内为types工作

### creating lists of unique labels, types
myLabelList=[]
myTypeList=[]

for myitem in mydata:
    for myCurrLabel in myitem['label']: #to account for multiple labels
        myLabelList.append(myCurrLabel) 
    if 'type' in myitem: #checking that type exists
        myTypes = myitem['type']
    myTypeList.append(myTypes)
    

myUniqueLabel=list(set(myLabelList))
myUniqueType=list(set(myTypeList))



myOutput = {}
for eachLabel in myUniqueLabel:
    myOutput[eachLabel] = {}
    for eachItem in myUniqueType:
        n = 0  # number of matches
        for k in mydata:
            if (eachLabel in k['label']) and (k.get('type') == eachItem):
                n += 1  
            else: n += 0
        myOutput[eachLabel][eachItem]=n


print (myOutput)

Tags: inforlabelstypeitemlabelpaperscience
1条回答
网友
1楼 · 发布于 2024-09-30 14:30:08

您可以使用嵌套循环一次性构建该字典:

mydata = [{'item': 'item1', 'label': ['history','politics'], 'type': 'paper'},
     {'item': 'item2', 'label': ['sport','politics'], 'type': 'magazine'},
     {'item': 'item3', 'label': ['science','politics'], 'type': 'paper'},
     {'item': 'item4', 'label': ['science'], 'type': 'book'},
     {'item': 'item5', 'label': ['science','fun']}
     ]


counters = {'magazine': 0, 'paper': 0, 'book': 0}
# counters = {d['type']:0 for d in mydata if 'type' in d} # if types not fixed

result = dict()
for d in mydata:                                 # go through dictionary list
    itemType  = d.get('type',None)               # get the type
    for label in d['label']:                     # go through labels list
        labelCounts = result.setdefault(label,{**counters}) # add/get a label
        if itemType : labelCounts[itemType] += 1 # count items for type if any
        
print(result)

{'history': {'magazine': 0, 'paper': 1, 'book': 0},
 'politics': {'magazine': 1, 'paper': 2, 'book': 0},
 'sport': {'magazine': 1, 'paper': 0, 'book': 0},
 'science': {'magazine': 0, 'paper': 1, 'book': 1},
 'fun': {'magazine': 0, 'paper': 0, 'book': 0}}

相关问题 更多 >