最具影响力的词中出现的停止词

2024-09-30 00:29:45 发布

您现在位置：Python中文网/ 问答频道 /正文

739

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在运行一些NLP代码，试图在调查中找到最有影响力（正面或负面）的词。我的问题是，虽然我成功地向NLTK stopwords文件中添加了一些额外的stopwords，但它们在以后会一直作为有影响力的词出现

所以，我有一个数据框，第一列包含分数，第二列包含评论

我添加了额外的停止词：

stopwords = stopwords.words('english')
extra = ['Cat', 'Dog']
stopwords.extend(extra)

我使用len方法在前后检查它们是否被添加

我创建此函数是为了从我的评论中删除标点符号和停止词：

def text_process(comment):
   nopunc = [char for char in comment if char not in string.punctuation]
   nopunc = ''.join(nopunc)
   return [word for word in nopunc.split() if word.lower() not in stopwords]

我运行模型（不包括整个代码，因为它没有什么区别）：

corpus = df['Comment']
y = df['Label']
vectorizer = CountVectorizer(analyzer=text_process)
x = vectorizer.fit_transform(corpus)

然后得到最有影响力的词：

feature_to_coef = {word: coef for word, coef in zip(vectorizer.get_feature_names(), nb.coef_[0])}


for best_positive in sorted(
    feature_to_coef.items(), 
    key=lambda x: x[1], 
    reverse=True)[:20]:
    print (best_positive)

但是，结果是猫和狗

我做错了什么，有什么想法吗

非常感谢

Tags：代码 text in for 评论 process extra feature

1条回答

网友

1楼 · 发布于 2024-09-30 00:29:45

看起来是因为你有大写的“猫”和“狗”

在text\u process函数中，有if word.lower() not in stopwords，只有当stopwords是小写时才有效

最具影响力的词中出现的停止词

相关问题更多 >

编程相关推荐

热门问题

热门文章

最具影响力的词中出现的停止词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >