我有这样的数据集:
Description attributes.occasion.0 attributes.occasion.1 attributes.occasion.2 attributes.occasion.3 attributes.occasion.4
descr01 Chanukah Christmas Housewarming Just Because Thank You
descr02 Anniversary Birthday Christmas Graduation Mother's Day
descr03 Chanukah Christmas Housewarming Just Because Thank You
descr04 Baby Shower Birthday Cinco de Mayo Gametime Just Because
descr05 Anniversary Birthday Christmas Graduation Mother's Day
descr01=>;关于场合的描述(我刚刚把短名称放在实际数据集中,它的全文描述)等等。在
在上面的数据集中,我有一个独立变量,它有文本描述和4个因变量。在
我尝试了随机森林分类器,它以多个依赖项作为输入。在
数据集的一个示例
^{pr2}$下面是我尝试过的代码:
## Split the dataset
X_train, X_test, y_train, y_test = train_test_split(df['Description'],df[['attributes.occasion.0','attributes.occasion.1','attributes.occasion.2','attributes.occasion.3','attributes.occasion.4']], test_size=0.3, random_state=0)
## Apply the model
from sklearn.ensemble import RandomForestClassifier
tfidf = Pipeline([('vect', HashingVectorizer(ngram_range=(1,7),non_negative=True)),
('tfidf', TfidfTransformer()),
])
def feature_combine(dataset):
Xall = []
i=1
for col in cols_to_retain:
if col != 'item_id' and col != 'last_updated_at':
Xall.append(tfidf.fit_transform(dataset[col].astype(str)))
joblib.dump(tfidf, "tfidf.sav")
Xspall = scipy.sparse.hstack(Xall)
#print Xspall
return Xspall
def test_Data_text_transform_and_combine(dataset):
Xall = []
i=1
for col in cols_to_retain:
if col != 'item_id' and col != 'last_updated_at':
Xall.append(tfidf.transform(dataset[col].astype(str)))
Xspall = scipy.sparse.hstack(Xall)
return Xspall
from sklearn.ensemble import RandomForestClassifier
text_clf = RandomForestClassifier()
_ = text_clf.fit(feature_combine(X_train), y_train)
RF_predicted = text_clf.predict(test_Data_text_transform_and_combine(X_test))
np.mean(RF_predicted == y_test)*100
当我计算精度测量值时,输出值低于输出值?但我知道如何解释这个结果,以及如何绘制混淆矩阵和其他性能指标。在
输出:
Accuracy for each dependent
attributes.occasion.0 87.517672
attributes.occasion.1 96.050306
attributes.occasion.2 98.362394
attributes.occasion.3 99.184142
attributes.occasion.4 99.564090
有谁能告诉我如何处理多标签问题以及如何评价模型的性能。在这种情况下,有什么可能的方法。我正在使用pythonsickit学习库。在
谢谢, 尼兰詹
目前没有回答
相关问题 更多 >
编程相关推荐