用python为文本分类管道生成PMML

2024-05-19 20:54:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试为文本分类管道生成PMML(使用jpmmlsklearn)。代码中的最后一行-sklearn2pmml(textpipline,”TextMiningClassifier.pmml“,其中_repr=True)-崩溃。在

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import SGDClassifier
from sklearn2pmml import PMMLPipeline

categories = [
'alt.atheism',
'talk.religion.misc',
]

print("Loading 20 newsgroups dataset for categories:")
print(categories)
data = fetch_20newsgroups(subset='train', categories=categories)
print("%d documents" % len(data.filenames))
print("%d categories" % len(data.target_names))

Textpipeline = PMMLPipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier()),
])

Textpipeline.fit(data.data, data.target)

from sklearn2pmml import sklearn2pmml

sklearn2pmml(Textpipeline, "TextMiningClassifier.pmml", with_repr = True)

sklearn2pmml()似乎无法将Textpipeline作为输入。该代码适用于其他管道(示例:https://github.com/jpmml/sklearn2pmml),但不适用于上面的文本分类管道。所以我的问题是:如何为文本分类问题生成PMML?在

我得到的错误:

^{pr2}$

Tags: 代码from文本importdata管道分类sklearn
1条回答
网友
1楼 · 发布于 2024-05-19 20:54:31

您需要使用PMML兼容的文本标记化函数。默认实现是类sklearn2pmml.feature_extraction.text.Splitter

from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn2pmml.feature_extraction.text import Splitter
vectorizer = TfidfVectorizer(analyzer = "word", token_pattern = None, tokenizer = Splitter())

更多的细节和参考可以在JPMML邮件列表中找到:https://groups.google.com/forum/#!topic/jpmml/wi-0rxzUn1o

相关问题 更多 >