擅长:python、mysql、java
<p>正如另一个答案所提到的,您调用<code>tb(blob)</code>太多了;对于一个包含N个单词的文档,您调用它的次数超过了N^2次。这总是很慢的。你需要做出这样的改变:</p>
<pre class="lang-py prettyprint-override"><code>for index, blob in enumerate(bloblist):
print("Top words in document {}".format(index + 1))
# XXX use textblob here just once
tblob = tb(blob)
scores = {word: tfidf(word, tblob, bloblist) for word in tblob.words}
sorted_words = sorted(scores.items(), key=lambda x: x[1], reverse=True)
i=1
for word, score in sorted_words[:5]:
print("\tWord "+str(i)+": {}, TF-IDF: {}".format(word, round(score, 5)))
i+=1
</code></pre>
<p>您还需要更改tfidf函数,以便它们每次都使用<code>tblob</code>,而不是调用<code>tb(blob)</code>。在</p>