比较两个列表中常见项目的最快方法

2024-10-04 07:31:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个类似的列表:

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]

我还有一个类似这样的查询列表:

queryList = ["abc","cccc","abc","yyy"]

queryList&listt[0]包含2"abc"个公共项

queryList&listt[1]包含1"abc"、1"cccc"&;1"yyy"共同点

所以我想要这样的输出:

[2,3] #2 = Total common items between queryList & listt[0]
      #3 = Total common items between queryList & listt[1]

我目前正在使用循环来实现这一点,但这似乎很慢。我将有数百万张清单,每个清单上有数千个项目

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]

totalMatch = []
for hashtree in listt:
    matches = 0
    tempQueryHash = queryList.copy()
    for hash in hashtree:
        for i in range(len(tempQueryHash)):
            if tempQueryHash[i]==hash:
                matches +=1
                tempQueryHash[i] = "" #Don't Match the same block twice.
                break

    totalMatch.append(matches)
print(totalMatch)

Tags: in列表forxxxccccabcmatcheszzz
3条回答

我还在学习Python的诀窍。但根据so上的thisolder帖子,类似于以下内容的内容应该是可行的:

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
OutputList = [len(list((Counter(x) & Counter(queryList)).elements())) for x in listt]
# [2, 3]

我会留意其他的方法

您可以列出listt和queryList的匹配项,并计算匹配数

output = ([i == z for i in listt[1] for z in queryList])
print(output.count(True))

JvdV答案的改进

基本上是求和值,而不是计算元素,并且缓存QueryList计数器

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
queryListCounter = Counter(queryList)
OutputList = [sum((Counter(x) & queryListCounter).values()) for x in listt]

相关问题 更多 >