Collections.counter（）正在计算字母而不是单词

from collections import Counter all_words = ' ' for msg in df['messages'].values: words = str(msg).lower() all_words = all_words + str(words) + ' ' count = Counter(all_words) count.most_common(3)

2条回答

网友

1楼 · 编辑于 2024-10-01 07:23:05

from collections import Counter
all_words = []
for msg in df['messages'].values:
    words = str(msg).lower().strip().split(' ')
    all_words.extend(words)
            
count = Counter(all_words)
count.most_common(3)

网友

2楼 · 编辑于 2024-10-01 07:23:05

计数器对传递给它的内容进行迭代。如果您向它传递一个字符串，它将进行迭代，因为它有个字符（这就是它的计数）。如果您向它传递一个列表（其中每个列表都是一个单词），它将按单词计数

from collections import Counter

text = "spam and more spam"

c = Counter()
c.update(text)  # text is a str, count chars
c
# Counter({'s': 2, 'p': 2, 'a': 3, 'm': 3, [...], 'e': 1})

c = Counter()
c.update(text.split())  # now is a list like: ['spam', 'and', 'more', 'spam']
c
# Counter({'spam': 2, 'and': 1, 'more': 1})

所以，你应该这样做：

from collections import Counter

all_words = []
for msg in df['messages'].values:
    words = str(msg).lower() 
    all_words.append(words)

count = Counter(all_words)
count.most_common(3)

# the same, but with  generator comprehension
count = Counter(str(msg).lower() for msg in df['messages'].values)

相关问题更多 >

编程相关推荐

热门问题

热门文章

Collections.counter（）正在计算字母而不是单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >