我想用pycluster库对20个新闻组文本进行聚类:https://codedocs.xyz/annoviko/pyclustering/classpyclustering_1_1cluster_1_1cure_1_1cure.html#details 例如治疗。据我所知,它需要这样的输入:[[0.1,0.5],[0.3,0.1]。。。]. 我可以用scikit TfidfVectorizer或其他什么方法来实现这一点吗?所需的值是矢量器圆括号中的值吗(例如(338615161)) 到目前为止我的代码是:
到目前为止,我尝试了矢量器与它,但它没有工作。你知道吗
categories = [
'alt.atheism',
'talk.religion.misc',
'comp.graphics',
'sci.space',
]
print("Loading 20 newsgroups dataset for categories:")
print(categories)
dataset = fetch_20newsgroups(subset='all', categories=categories,
shuffle=True, random_state=42)
vectorizer = TfidfVectorizer(max_df=0.5,min_df=2, stop_words='english')
X = vectorizer.fit_transform(dataset.data)
print(X)
X = X.toarray()
# Allocate three clusters.
cure_instance = cure(X, 100);
cure_instance.process();
clusters = cure_instance.get_clusters();
# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, X);
visualizer.show();
我只想用sklearn-Birch把文本分类。现在它只是被杀死了。你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐