擅长:python、mysql、java
<p>首先初始化一组停止字,并在规范化它们(删除标点符号、小写字母等)后从文本中记录字数</p>
<p>然后,您可以对不在停止词集中的词的dict值求和</p>
<p>我使用了部分代码,但采用了上面详述的方法</p>
<pre class="lang-py prettyprint-override"><code>from collections import defaultdict
def normalize(line):
line = line.lower()
return line.translate(line.maketrans("","",string.punctuation))
# create a normalized stop-word set
stop_words = set()
with open("stopwords.txt" , "r") as f:
for line in f:
stop_words.update(normalize(line).split())
# create normalized-words count dictionary
words_count = defaultdict(int)
with open("usconst.txt" , "r") as f:
for line in f:
for w in normalize(line).split():
words_count[w] += 1
# list by most frequent words which are not stop-words
sorted([k,v for k,v in words_count.items() if k not in stop_words], reverse=True, key=lambda x: x[1])
</code></pre>