用python打印Unigram计数问题的回答

用python打印Unigram计数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

你的名单太多了。另外，不要依赖所有这些神奇的数字来计算行数、每个列表的最大单词数/条目数等等。不要为每行中的单词使用一个列表，而只需为所有单词使用一个列表。而不是第二个计数列表，只需使用<a href="https://docs.python.org/3/library/stdtypes.html#dict" rel="nofollow noreferrer">dictionary</a>来保存唯一的单词和它们的计数： <pre><code>with open("corpus.txt") as f: counts = {} for line in f: for word in line.split(): if word not in counts: counts[word] = 1 else: counts[word] += 1 </code></pre> 之后，<code>counts</code>如下所示：<code>{'peter': 4, 'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}</code>为了检索单词和计数，您还可以使用一个循环： ^{pr2}$ 当然，您可以使用<code>collections.Counter</code>在更少的代码行中完成同样的操作，但我认为手动操作将有助于您进一步了解Python。在 <hr/> 老实说，我不明白<code>for i in [1,2,3,4]:</code>下面的任何代码应该做什么。似乎你想为单词创建一种共现矩阵？在这种情况下，我也建议使用一个（嵌套）字典，这样可以更容易地存储和检索antries。在 <pre><code>with open("corpus.txt") as f: matrix = {} for line in f: for word1 in line.split(): if word1 not in matrix: matrix[word1] = {} for word2 in line.split(): if word2 != word1: if word2 not in matrix[word1]: matrix[word1][word2] = 1 else: matrix[word1][word2] += 1 </code></pre> 代码几乎和以前一样，但是在同一行上有另一个嵌套循环循环。例如，<code>"peter"</code>的输出将是<code>{'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}</code>

用python打印Unigram计数

1 个回答

相关Python问题