清除stopwords中的列表

2024-10-01 22:28:19 发布

您现在位置:Python中文网/ 问答频道 /正文

此变量:

sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

需要清除那些不必要的字眼。 我试过了

output = [w for w in sent if not w in stop_words]

但它没有起作用。 怎么了


Tags: theinyouwhichyourincludeshowperformance
3条回答
from nltk.corpus import stopwords

stop_words = {w.lower() for w in stopwords.words('english')}

sent = [('include', 'details', 'about', 'your', 'performance'),
        ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

如果要创建单个单词列表,但不包含停止词

>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

如果你想保持句子完整

>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]

然而,大多数情况下,你会使用一个单词列表(没有括号)

sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']

>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

您的实际代码中是否缺少引号?如果使用相同类型的引号,请确保关闭所有字符串并用反斜杠转义撇号。我也会把每个词分开,像这样:

sent=[('include', 'details', 'about', 'your', 'performance'), ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

正是圆括号阻碍了迭代。如果可以删除它们:

sent=['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
output = [w for w in sent if not w in stopwords]

如果没有,则可以执行以下操作:

sent=[('include', 'details', 'about', 'your performance'),('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
output = [i for s in [[w for w in l if w not in stopwords] for l in sent] for i in s]

相关问题 更多 >

    热门问题