nltk文本分类使用自定义特征 - 问答 - Python中文网

nltk文本分类使用自定义特征

2024-10-01 09:42:10 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有一个数据集如下：

featureDict = {identifier1: [[first 3-gram], [second 3-gram], ... [last 3-gram]],
               ...
               identifierN: [[first 3-gram], [second 3-gram], ... [last 3-gram]]}

另外，我对同一组文档有一个标签：

^{pr2}$

我想找出最合适的nltk容器，在这个容器中我可以在一个地方存储这些信息，并无缝地应用nltk分类器。在

另外，在我对这个数据集使用任何分类器之前，我还想在这个特性空间中使用tf-idf过滤器。在

参考资料和文件将很有帮助。在

Tags：数据文档分类器地方标签容器 first gram

1条回答

网友

1楼 · 发布于 2024-10-01 09:42:10

你只需要一个简单的dict。看看NLTK classify interface using trained classifier中的片段。在

这方面的参考文档仍然是nltk书籍：http://nltk.org/book/ch06.html和API规范：http://nltk.org/api/nltk.classify.html

以下是一些对你有帮助的页面：http://snipperize.todayclose.com/snippet/py/Use-NLTK-Toolkit-to-Classify-Documents 5671027/，http://streamhacker.com/tag/feature-extraction/，http://web2dot5.wordpress.com/2012/03/21/text-classification-in-python/。在

另外，请记住，nltk在它提供的分类器算法方面是有限的。对于更高级的探索，最好使用scikit learn。在

相关问题更多 >

编程相关推荐

热门问题

热门文章