用python打印Unigram计数

peter piper picked a peck of pickled peppers a peck of pickled peppers peter piper picked if peter piper picked a peck of pickled peppers where s the peck of pickled peppers peter piper picked

f = open("corpus.txt","r") w, h = 100, 100; k=1 a=0 uwordcount=[] for i in range(100): uwordcount.append(0) uword = [[0 for x in range(w)] for y in range(h)] l = [[0 for x in range(w)] for y in range(h)] l[1] = f.readline() l[2] = f.readline() l[3] = f.readline() l[4] = f.readline() lwords = [[0 for x in range(w)] for y in range(h)] lwords[1]=l[1].split() lwords[2]=l[2].split() lwords[3]=l[3].split() lwords[4]=l[4].split() for i in [1,2,3,4]: for j in range(len(lwords[i])): uword[k]=lwords[i][j] uwordcount[k]=0 for x in [1,2,3,4]: for y in range(len(lwords[i])): if uword[k] == lwords[x][y]: uwordcount[k]=uwordcount[k]+1 for z in range(k): if uword[k]==uword[z]: a=1 if a==0: print(uwordcount[k],' ',uword[k]) k=k+1

3条回答

网友

1楼 · 编辑于 2024-06-26 09:14:45

你的名单太多了。另外，不要依赖所有这些神奇的数字来计算行数、每个列表的最大单词数/条目数等等。不要为每行中的单词使用一个列表，而只需为所有单词使用一个列表。而不是第二个计数列表，只需使用dictionary来保存唯一的单词和它们的计数：

with open("corpus.txt") as f:
    counts = {}
    for line in f:
        for word in line.split():
            if word not in counts:
                counts[word] = 1
            else:
                counts[word] += 1

之后，counts如下所示：{'peter': 4, 'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}为了检索单词和计数，您还可以使用一个循环：

^{pr2}$

当然，您可以使用collections.Counter在更少的代码行中完成同样的操作，但我认为手动操作将有助于您进一步了解Python。在

老实说，我不明白for i in [1,2,3,4]:下面的任何代码应该做什么。似乎你想为单词创建一种共现矩阵？在这种情况下，我也建议使用一个（嵌套）字典，这样可以更容易地存储和检索antries。在

with open("corpus.txt") as f:
    matrix = {}
    for line in f:
        for word1 in line.split():
            if word1 not in matrix:
                matrix[word1] = {}
            for word2 in line.split():
                if word2 != word1:
                    if word2 not in matrix[word1]:
                        matrix[word1][word2] = 1
                    else:
                        matrix[word1][word2] += 1

代码几乎和以前一样，但是在同一行上有另一个嵌套循环循环。例如，"peter"的输出将是{'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}

网友

2楼 · 编辑于 2024-06-26 09:14:45

老实说，我没有得到你的代码，因为你有更多的循环和不必要的逻辑（我想）。所以我用我自己的方式来做。在

import pprint

with open('corups.txt', 'r') as cr:
     dic= {}  # Empty dictionary
     lines = cr.readlines()

     for line in lines:
         if line in dic:   # If key already exists in dic then add 1 to its value
             dic['line'] += 1

         else:
             dic['line'] = 1   # If key is not present in dic then create value as 1

pprint.pprint(dic)  # Using pprint built in function to print dictionary data types

If you are in real hurry then use collections.Counter

网友

3楼 · 编辑于 2024-06-26 09:14:45

索引器错误：列表索引超出范围意味着您的某个索引试图访问列表之外的内容-您需要debug your code来查找情况。在

使用collections.Counter简化此任务：

# with open('corups.txt', 'r') as r: text = r.read()

text = """peter piper picked a peck of pickled peppers 
 a peck of pickled peppers peter piper picked 
 if peter piper picked a peck of pickled peppers 
 where s the peck of pickled peppers peter piper picked """

from collections import Counter

# split the text in lines, then each line into words and count those:
c = Counter( (x for y in text.strip().split("\n") for x in y.split()) )

# format the output
print(*(f"{cnt} {wrd}" for wrd,cnt in c.most_common()), sep="\n")

输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章