如何在sklearn count矢量器返回的矩阵中获取列和？

2024-09-30 16:39:22 发布

男 | 程序猿一只，喜欢编程写python代码。

如何获得sklearnCountVectorizer返回的术语频率矩阵中任何给定列的总和

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()

corpus = [ 'This is a sentence',
           'Another sentence is here',
           'Wait for another sentence',
           'The sentence is coming',
           'The sentence has come'
         ]

x = vectorizer.fit_transform(corpus)

例如，我想找出矩阵中sentence的频率。所以我想要sentence列的和。我想不出一个办法：

例如，我尝试了x['sentence'].sum()，但没有帮助
我还尝试将其转换为数据帧并计算总和，但我不需要将此矩阵转换为数据帧

Tags： the 数据 import pandas is as 矩阵 corpus

1条回答

网友

1楼 · 发布于 2024-09-30 16:39:22

您可以尝试以下操作：

从CountVectorizer获取术语在feature_names()列表中的位置
使用位置对CSR矩阵中的所有列求和（x，在您的情况下）

代码：

import numpy as np

term_to_sum = 'sentence'    
index_term = vectorizer.get_feature_names().index(term_to_sum)

s = np.sum(x[:, index_term])  # here you get the sum

如何在sklearn count矢量器返回的矩阵中获取列和？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在sklearn count矢量器返回的矩阵中获取列和？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >