Python：如何优化计算？

measure = nltk.collocations.BigramAssocMeasures() dicto = {} for i in lines : tokens = nltk.wordpunct_tokenize(i) m = tokens[0] #m is the word list_i = tokens[4:] list_i.pop() for x in list_i : if x ==',': ind = list_i.index(x) list_i.pop(ind) dicto[m]=list_i #for each word i create a dictionnary with the n° of lines #for each word I calculate the Chi-squared with every other word #and my problem is starting right here i think #The "for" loop and the z = ..... for word1 in dicto : x=dicto[word1] vector = [] for word2 in dicto : y=dicto[word2] z=[val for val in x if val in y] #Contingency Matrix m11 = cpt-(len(x)+len(y)-len(z)) m12 = len(x)-len(z) m21 = len(y)-len(z) m22 = len(z) n_ii =m11 n_ix =m11+m21 n_xi =m11+m12 n_xx =m11+m12+m21+m22 Chi_squared = measure.chi_sq(n_ii, (n_ix, n_xi), n_xx) #I compare with the minimum value to check independancy between words if Chi_squared >3.841 : vector.append([word1, word2 , round(Chi_square,3)) #The correlations calculated #I sort my vector in a descending way final=sorted(vector, key=lambda vector: vector[2],reverse = True) print word1 #I take the 4 best scores for i in final[:4]: print i,

2条回答

网友

1楼 · 编辑于 2024-09-27 09:24:36

首先，如果每个单词都有唯一的行号，请使用集合而不是列表：查找集合交集要比列表交集快得多（尤其是在列表没有排序的情况下）。你知道吗

第二，预先计算列表长度-现在你为每个内循环步骤计算两次。你知道吗

第三，使用numpy进行这种计算。你知道吗

网友

2楼 · 编辑于 2024-09-27 09:24:36

有一些加速的机会，但我首先关心的是向量。它在哪里初始化？在发布的代码中，它得到n^2个条目并排序n次！这似乎是无意的。应该清除吗？决赛应该在圈外吗？你知道吗

final=排序（vector，key=lambda vector:vector[2]，reverse=True）

是功能性的，但范围很难看，更好的是：

lambda=True entry[key=final entry]，反向输入：2

一般来说，要解决计时问题，可以考虑使用profiler。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章