我正在尝试为文本分类管道生成PMML(使用jpmmlsklearn)。代码中的最后一行-sklearn2pmml(textpipline,”TextMiningClassifier.pmml“,其中_repr=True)-崩溃。在
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import SGDClassifier
from sklearn2pmml import PMMLPipeline
categories = [
'alt.atheism',
'talk.religion.misc',
]
print("Loading 20 newsgroups dataset for categories:")
print(categories)
data = fetch_20newsgroups(subset='train', categories=categories)
print("%d documents" % len(data.filenames))
print("%d categories" % len(data.target_names))
Textpipeline = PMMLPipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier()),
])
Textpipeline.fit(data.data, data.target)
from sklearn2pmml import sklearn2pmml
sklearn2pmml(Textpipeline, "TextMiningClassifier.pmml", with_repr = True)
sklearn2pmml()似乎无法将Textpipeline作为输入。该代码适用于其他管道(示例:https://github.com/jpmml/sklearn2pmml),但不适用于上面的文本分类管道。所以我的问题是:如何为文本分类问题生成PMML?在
我得到的错误:
^{pr2}$
您需要使用PMML兼容的文本标记化函数。默认实现是类
sklearn2pmml.feature_extraction.text.Splitter
:更多的细节和参考可以在JPMML邮件列表中找到:https://groups.google.com/forum/#!topic/jpmml/wi-0rxzUn1o
相关问题 更多 >
编程相关推荐