擅长:python、mysql、java
<p>你是说这个吗?在</p>
<pre><code>In [13]: from sklearn.feature_extraction.text import CountVectorizer
In [14]: vectorize = CountVectorizer(min_df=1)
In [15]: document1 = "foo bar baz"
...: document2 = "bar bar baz dee"
...:
In [16]: documents = [document1, document2]
In [17]: d = vectorize.fit_transform(documents)
In [18]: vectorize.vocabulary_
Out[18]: {u'bar': 0, u'baz': 1, u'dee': 2, u'foo': 3}
In [19]: d.todense()
Out[19]:
matrix([[1, 1, 0, 1],
[2, 1, 1, 0]], dtype=int64)
</code></pre>