我使用sklearn创建了一个使用Tf-Idf的文本分类器,我想使用BERT和Elmo嵌入来代替Tf-Idf
你会怎么做
我正在使用下面的代码来获取Bert嵌入:
from flair.data import Sentence
from flair.embeddings import TransformerWordEmbeddings
# init embedding
embedding = TransformerWordEmbeddings('bert-base-uncased')
# create a sentence
sentence = Sentence('The grass is green .')
# embed words in sentence
embedding.embed(sentence)
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
column_trans = ColumnTransformer([
('tfidf', TfidfVectorizer(), 'text'),
('number_scaler', MinMaxScaler(), ['number'])
])
# Initialize data
data = [
['This process, however, afforded me no means of.', 20, 1],
['another long description', 21, 1],
['It never once occurred to me that the fumbling', 19, 0],
['How lovely is spring As we looked from Windsor', 18, 0]
]
# Create DataFrame
df = pd.DataFrame(data, columns=['text', 'number', 'target'])
X = column_trans.fit_transform(df)
X = X.toarray()
y = df.loc[:, "target"].values
# Perform classification
classifier = LogisticRegression(random_state=0)
classifier.fit(X, y)
Sklearn提供了定制data transformer的可能性(与机器学习模型“transformers”无关)
我实现了一个自定义的sklearn数据转换器,它使用您使用的
flair
库。请注意,我使用了TransformerDocumentEmbeddings
而不是TransformerWordEmbeddings
。还有一个是与transformers
库一起工作的我添加了一个SO问题,讨论使用here感兴趣的转换器层
我不熟悉Elmo,尽管我发现this使用tensorflow。您可以修改我共享的代码,使Elmo正常工作
在您的情况下,用以下方法更换柱变压器:
相关问题 更多 >
编程相关推荐