从python中的计数器中删除停止字列表

网友

1楼 · 编辑于 2024-10-02 16:29:14

我会将项目平铺成单词，忽略任何停止词，而是将其作为单个Counter的输入：

from collections import Counter
from itertools import chain

lines = [
    "this is a concordance string something", 
    "this is another concordance string blah"
]

stops = {'this', 'that', 'a', 'is'}    
words = chain.from_iterable(line.split() for line in lines)
count = Counter(word for word in words if word not in stops)

或者，最后一点可以作为：

^{pr2}$

网友

2楼 · 编辑于 2024-10-02 16:29:14

你可以在标记化过程中删除停止词。。。在

stop_words = frozenset(['the', 'a', 'is'])
def mostCommonWords(concordanceList):
    finalCount = Counter()
    for line in concordanceList:
        words = [w for w in line.split(" ") if w not in stop_words]
        finalCount.update(words)  # update final count using the words list
    return finalCount

网友

3楼 · 编辑于 2024-10-02 16:29:14

首先，不需要在函数中创建所有新的Counter；可以执行以下操作：

for line in concordanceList:
    finalCount.update(line.split(" "))

相反。在

其次，Counter是一种字典，因此可以直接删除项目：

^{pr2}$

无论sword是否在Counter中，这都不会引发异常。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

从python中的计数器中删除停止字列表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >