查找只出现在

def retHapax(): file = open("myfile.txt") myMap = {} uniqueMap = {} for i in file: myList = i.split(' ') for j in myList: j = j.rstrip() if j in myMap: del uniqueMap[j] else: myMap[j] = 1 uniqueMap[j] = 1 file.close() print uniqueMap

3条回答

网友

1楼 · 编辑于 2024-10-01 15:37:04

如果您想找到所有唯一的单词并考虑foo与foo.相同，并且需要去掉标点符号。在

from collections import Counter
from string import punctuation

with open("myfile.txt") as f:
    word_counts = Counter(word.strip(punctuation) for line in f for word in line.split())

print([word for word, count in word_counts.iteritems() if count == 1])

如果要忽略大小写，还需要使用line.lower()。如果你想准确地得到唯一的单词，那么就不仅仅是在空白处拆分行。在

网友

2楼 · 编辑于 2024-10-01 15:37:04

尝试使用此方法在文件.使用Counter

from collections import Counter
with open("myfile.txt") as input_file:
    word_counts = Counter(word for line in input_file for word in line.split())
>>> [word for (word, count) in word_counts.iteritems() if count==1]
-> list of unique words (words that appear exactly once)

网友

3楼 · 编辑于 2024-10-01 15:37:04

我会使用collections.Counter方法，但是如果您只想使用sets，那么您可以通过以下方式实现：

with open('myfile.txt') as input_file:
    all_words = set()
    dupes = set() 
    for word in (word for line in input_file for word in line.split()):
        if word in all_words:
            dupes.add(word)
        all_words.add(word)

    unique = all_words - dupes

给定输入：

^{pr2}$

输出为：

{'five', 'one', 'six'}

相关问题更多 >

编程相关推荐

热门问题

热门文章