在使用Python的另一个文件中找不到的一个文件中最常见的单词

from collections import Counter import re dgWords = re.findall(r'\w+', open('test.txt').read().lower()) f = open('test2.txt', 'rb') sWords = [line.strip() for line in f] print(len(dgWords)); for sWord in sWords: print (sWord) print (dgWords) while sWord in dgWords: dgWords.remove(sWord) print(len(dgWords)); mostFrequentWord = Counter(dgWords).most_common(1) print (mostFrequentWord)

3条回答

网友

1楼 · 编辑于 2024-09-26 18:20:00

下面是我的方法-使用集合

all_words = re.findall(r'\w+', open('test.txt').read().lower())

f = open('test2.txt', 'rb')
stop_words = [line.strip() for line in f]

set_all = set(all_words)
set_stop = set(stop_words)

all_only = set_all - set_stop

print Counter(filter(lambda w:w in all_only, all_words)).most_common(1)

这应该是稍微快一点，以及你做了一个计数器上只有'所有的\'的话

网友

2楼 · 编辑于 2024-09-26 18:20:00

import re
from collections import Counter

with open('test.txt') as testfile, open('test2.txt') as stopfile:
    stopwords = set(line.strip() for line in stopfile)
    words = Counter(re.findall(r'\w+', open('test.txt').read().lower()))
    for word in stopwords:
        if word in words:
            words.pop(word)
    print("the most frequent word is", words.most_common(1))

网友

3楼 · 编辑于 2024-09-26 18:20:00

我只是简单地修改了你原来代码的下面一行

f = open('test2.txt', 'rb')

至

f = open('test2.txt', 'r')

而且成功了。只需将文本读取为字符串而不是二进制文件。否则它们在正则表达式中就不匹配了。在python3.4eclipsepydevwin7x64上测试。你知道吗

离题：

使用带有语句的打开文件更像python。在这种情况下，写

with open('test2.txt', 'r') as f:

并相应地缩进文件处理语句。这样可以避免忘记关闭文件流。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章