如何理解稀疏矩阵输出？

from sklearn.feature_extraction.text import CountVectorizer test_sent = ['hello', 'i', 'am', 'hello', 'i', 'dont', 'want', 'to', 'i', 'dont'] bigram_vec = CountVectorizer(ngram_range=(1,2)) X = bigram_vec.fit_transform(test_sent) Xc = (X.T * X) print Xc

from sklearn.feature_extraction.text import CountVectorizer test_sent = ['hello biggest awesome biggest biggest awesome today lively splendid awesome today'] bigram_vec = CountVectorizer(ngram_range=(2,2)) X = bigram_vec.fit_transform(test_sent) print bigram_vec.get_feature_names() Xc = (X.T * X) print Xc print ' ' print Xc.todense() (4, 0) 1 (2, 0) 2 (0, 0) 1 (3, 0) 1 (1, 0) 2 (7, 0) 1 (5, 0) 1 (6, 0) 1 (4, 1) 2 (2, 1) 4 (0, 1) 2 (3, 1) 2 (1, 1) 4 (7, 1) 2 (5, 1) 2 (6, 1) 2 (4, 2) 2 (2, 2) 4 (0, 2) 2 (3, 2) 2 (1, 2) 4 (7, 2) 2 (5, 2) 2 (6, 2) 2 (4, 3) 1 : : (6, 4) 1 (4, 5) 1 (2, 5) 2 (0, 5) 1 (3, 5) 1 (1, 5) 2 (7, 5) 1 (5, 5) 1 (6, 5) 1 (4, 6) 1 (2, 6) 2 (0, 6) 1 (3, 6) 1 (1, 6) 2 (7, 6) 1 (5, 6) 1 (6, 6) 1 (4, 7) 1 (2, 7) 2 (0, 7) 1 (3, 7) 1 (1, 7) 2 (7, 7) 1 (5, 7) 1 (6, 7) 1 [[1 2 2 1 1 1 1 1] [2 4 4 2 2 2 2 2] [2 4 4 2 2 2 2 2] [1 2 2 1 1 1 1 1] [1 2 2 1 1 1 1 1] [1 2 2 1 1 1 1 1] [1 2 2 1 1 1 1 1] [1 2 2 1 1 1 1 1]]

1条回答

网友
1楼 · 发布于 2024-05-19 00:00:48

首先，您需要检查CountVectorizer正在使用的功能名称。你知道吗
请执行以下操作：
bigram_vec.get_feature_names() # Out: [u'am', u'dont', u'hello', u'to', u'want']
您可以看到"i"这个词不存在。这是因为默认标记器使用了一种模式：
token_pattern : string
Regular expression denoting what constitutes a “token”, only used if analyzer == 'word'. The default regexp select tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).
X的实际输出应解释为：
[u'am', u'dont', u'hello', u'to', u'want'] 'hello' [[ 0 0 1 0 0] 'i' [ 0 0 0 0 0] 'am' [ 1 0 0 0 0] 'hello' [ 0 0 1 0 0] 'i' [ 0 0 0 0 0] 'dont' [ 0 1 0 0 0] 'want' [ 0 0 0 0 1] 'to' [ 0 0 0 1 0] 'i' [ 0 0 0 0 0] 'dont' [ 0 1 0 0 0]]
现在，当您执行X.T * X时，这应该解释为：
u'am' u'dont' u'hello' u'to' u'want' u'am' [[1 0 0 0 0] u'dont' [0 2 0 0 0] u'hello' [0 0 2 0 0] u'to' [0 0 0 1 0] u'want' [0 0 0 0 1]]
如果你还期待什么，那么你应该在问题中添加细节。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章