如何使这个列表运行得更快？

def removeDuplicatesFromList(seq): # Not order preserving keys = {} for e in seq: keys[e] = 1 return keys.keys() def countWordDistances(li): ''' If li = ['that','sank','into','the','ocean'] This function would return: { that:1, sank:2, into:3, the:4, ocean:5 } However, if there is a duplicate term, take the average of their positions ''' wordmap = {} unique_words = removeDuplicatesFromList(li) for w in unique_words: distances = [i+1 for i,x in enumerate(li) if x == w] wordmap[w] = float(sum(distances)) / float(len(distances)) #take average return wordmap

3条回答

网友

1楼 · 编辑于 2024-09-30 16:20:38

import collections
def countWordDistances(li):
    wordmap = collections.defaultdict(list)
    for i, w in enumerate(li, 1):
        wordmap[w].append(i)
    for k, v in wordmap.iteritems():
        wordmap[k] = sum(v)/float(len(v))

    return wordmap

这使得只对列表进行一次遍历，并将操作保持在最小值。我用110万个词条，29k个独特的单词，在一个单词列表上计时，它的速度几乎是Patrick答案的两倍。在一个由10k个单词组成的列表中，2k是独一无二的，它比OP的代码快了300多倍。在

为了使Python代码运行得更快，有两条规则需要记住：使用最好的算法，避免使用Python。在

在算法方面，迭代列表一次而不是N+1次（N=唯一单词的数量）是加快这一速度的主要因素。在

在“避免Python”方面，我的意思是：您希望您的代码尽可能多地在C中执行。因此，使用defaultdict比显式检查密钥是否存在的dict要好。defaultdict为您检查，但在Python实现中，它是用C语言进行的。enumerate比for i in range(len(li))好，同样因为它的Python步骤更少。并且enumerate(li, 1)使计数从1开始，而不是在循环中的某个地方使用Python+1。在

编辑：第三条规则：使用PyPy。我的代码在PyPy上的速度是2.7的两倍。在

网友

2楼 · 编辑于 2024-09-30 16:20:38

我不确定这是否会比使用集合更快，但它只需要通过列表一次：

def countWordDistances(li):
    wordmap = {}
    for i in range(len(li)):
        if li[i] in wordmap:
            avg, num = wordmap[li[i]]
            new_avg = avg*(num/(num+1.0)) + (1.0/(num+1.0))*i
            wordmap[li[i]] = new_avg, num+1
        else:
            wordmap[li[i]] = (i, 1)

    return wordmap

这将返回wordmap的修改版本，与每个键相关联的值是平均位置和出现次数的元组。显然可以很容易地将其转换为原始输出的形式，但这需要一些时间。在

代码基本上在遍历列表时保持一个运行平均值，每次都用加权平均值重新计算。在

网友

3楼 · 编辑于 2024-09-30 16:20:38

基于@Ned Batchelder的解决方案，但不创建虚拟列表：

import collections
def countWordDistances(li):
    wordmap = collections.defaultdict(lambda:[0.0, 0.0])
    for i, w in enumerate(li, 1):
        wordmap[w][0] += i
        wordmap[w][1] += 1.0
    for k, (t, n) in wordmap.iteritems():
        wordmap[k] = t / n
    return wordmap

相关问题更多 >

编程相关推荐

热门问题

热门文章