lis中关键字的出现频率问题的回答

lis中关键字的出现频率

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这是一个没有进口的解决方案。它使用嵌套的线性搜索，在一个小的输入数组上进行少量的搜索是可以接受的，但是当输入较大时会变得笨拙和缓慢。在 这里的输入仍然很大，但是它在合理的时间内处理它。我怀疑如果你的关键字文件更大（我的只有3个字）减速将开始显示。在 这里我们获取一个输入文件，遍历行并删除标点符号，然后按空格分割并将所有单词展平到一个列表中。列表中有重复项，因此要删除它们，我们对列表进行排序，使重复项聚集在一起，然后在列表上进行迭代，创建一个包含字符串和计数的新列表。我们可以通过增加计数来做到这一点，只要同一个单词出现在列表中，并在看到新单词时移动到新条目。在 现在，你有了你的词频列表，你可以在其中搜索所需的关键字并检索计数。在 输入的文本文件是<a href="https://courses.cs.washington.edu/courses/cse341/07wi/handouts/hamlet.txt" rel="nofollow noreferrer">here</a>，关键字文件可以用文件中的几个单词拼凑在一起，每行一个。在 python3代码，它指示在适用的情况下如何为python2修改。在 <pre><code># use string.punctuation if you are somehow allowed # to import the string module. translator = str.maketrans('', '', '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~') words = [] with open('hamlet.txt') as f: for line in f: if line: line = line.translate(translator) # py 2 alternative #line = line.translate(None, string.punctuation) words.extend(line.strip().split()) # sort the word list, so instances of the same word are # contiguous in the list and can be counted together words.sort() thisword = '' counts = [] # for each word in the list add to the count as long as the # word does not change for w in words: if w != thisword: counts.append([w, 1]) thisword = w else: counts[-1][1] += 1 for c in counts: print('%s (%d)' % (c[0], c[1])) # function to prevent need to break out of nested loop def findword(clist, word): for c in clist: if c[0] == word: return c[1] return 0 # open keywords file and search for each word in the # frequency list. with open('keywords.txt') as f2: for line in f2: if line: word = line.strip() thiscount = findword(counts, word) print('keyword %s appear %d times in source' % (word, thiscount)) </code></pre> 如果您愿意，可以修改<code>findword</code>以使用二进制搜索，但它仍然不会接近<code>dict</code>。<code>collections.Counter</code>是没有限制的正确解决方案。它更快、更少的代码。在

lis中关键字的出现频率

1 个回答

相关Python问题