问题是,我从Kerasmodel.fit
历史中得到的validation accuracy
值明显高于我从sklearn.metrics
函数得到的validation accuracy
度量。在
我从model.fit
得到的结果总结如下:
Last Validation Accuracy: 0.81
Best Validation Accuracy: 0.84
来自sklearn
的结果(标准化)非常不同:
我认为这个问题和这个问题有点相似Sklearn metrics values are very different from Keras values 但是我检查了两种方法都是在同一个数据池上进行验证的,所以这个答案可能不适合我的情况。在
另外,这个问题Keras binary accuracy metric gives too high accuracy似乎解决了一些问题,即二进制交叉熵影响多类问题,但在我的例子中,它可能不适用,因为它是一个真正的二进制分类问题。在
以下是使用的命令:
模型定义:
inputs = Input((Tx, ))
n_e = 30
embeddings = Embedding(n_x, n_e, input_length=Tx)(inputs)
out = Bidirectional(LSTM(32, recurrent_dropout=0.5, return_sequences=True))(embeddings)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5, return_sequences=True))(out)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5))(out)
out = Dense(3, activation='softmax')(out)
modelo = Model(inputs=inputs, outputs=out)
modelo.summary()
模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 100) 0
_________________________________________________________________
embedding (Embedding) (None, 100, 30) 86610
_________________________________________________________________
bidirectional (Bidirectional (None, 100, 64) 16128
_________________________________________________________________
bidirectional_1 (Bidirection (None, 100, 32) 10368
_________________________________________________________________
bidirectional_2 (Bidirection (None, 32) 6272
_________________________________________________________________
dense (Dense) (None, 3) 99
=================================================================
Total params: 119,477
Trainable params: 119,477
Non-trainable params: 0
_________________________________________________________________
模型编译:
mymodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
模特试穿电话:
num_epochs = 30
myhistory = mymodel.fit(X_pad, y, epochs=num_epochs, batch_size=50, validation_data=[X_val_pad, y_val_oh], shuffle=True, callbacks=callbacks_list)
模型拟合日志:
Train on 505 samples, validate on 127 samples
Epoch 1/30
500/505 [============================>.] - ETA: 0s - loss: 0.6135 - acc: 0.6667
[...]
Epoch 10/30
500/505 [============================>.] - ETA: 0s - loss: 0.1403 - acc: 0.9633
Epoch 00010: val_acc improved from 0.77953 to 0.79528, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 41ms/sample - loss: 0.1393 - acc: 0.9637 - val_loss: 0.5203 - val_acc: 0.7953
Epoch 11/30
500/505 [============================>.] - ETA: 0s - loss: 0.0865 - acc: 0.9840
Epoch 00011: val_acc did not improve from 0.79528
505/505 [==============================] - 21s 41ms/sample - loss: 0.0860 - acc: 0.9842 - val_loss: 0.5257 - val_acc: 0.7953
Epoch 12/30
500/505 [============================>.] - ETA: 0s - loss: 0.0618 - acc: 0.9900
Epoch 00012: val_acc improved from 0.79528 to 0.81102, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0615 - acc: 0.9901 - val_loss: 0.5472 - val_acc: 0.8110
Epoch 13/30
500/505 [============================>.] - ETA: 0s - loss: 0.0415 - acc: 0.9940
Epoch 00013: val_acc improved from 0.81102 to 0.82152, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0413 - acc: 0.9941 - val_loss: 0.5853 - val_acc: 0.8215
Epoch 14/30
500/505 [============================>.] - ETA: 0s - loss: 0.0443 - acc: 0.9933
Epoch 00014: val_acc did not improve from 0.82152
505/505 [==============================] - 21s 42ms/sample - loss: 0.0453 - acc: 0.9921 - val_loss: 0.6043 - val_acc: 0.8136
Epoch 15/30
500/505 [============================>.] - ETA: 0s - loss: 0.0360 - acc: 0.9933
Epoch 00015: val_acc improved from 0.82152 to 0.84777, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0359 - acc: 0.9934 - val_loss: 0.5663 - val_acc: 0.8478
[...]
Epoch 30/30
500/505 [============================>.] - ETA: 0s - loss: 0.0039 - acc: 1.0000
Epoch 00030: val_acc did not improve from 0.84777
505/505 [==============================] - 20s 41ms/sample - loss: 0.0039 - acc: 1.0000 - val_loss: 0.8340 - val_acc: 0.8110
sklearn的混淆矩阵:
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_values, predicted_values)
预测值和金值确定如下:
preds = mymodel.predict(X_val)
preds_ints = [[el] for el in np.argmax(preds, axis=1)]
values_pred = tokenizer_y.sequences_to_texts(preds_ints)
values_gold = tokenizer_y.sequences_to_texts(y_val)
最后,我想补充一点,我已经打印出数据和所有的预测误差,我相信sklearn值更可靠,因为它们似乎与我从打印保存的“最佳”模型的预测中得到的结果相匹配。在
另一方面,我不明白这些指标怎么会如此不同。因为他们都是非常有名的软件,我断定我是犯错误的人,但我不能确定在哪里或如何。在
你的问题是不适定的;正如已经说过的,你没有计算你的scikit学习模型的实际精度,因此你似乎在比较苹果和桔子。从一个标准化的混淆矩阵计算(TP+TN)/2不能给出准确度。下面是一个简单的使用玩具数据的除臭剂,从docs改编
plot_confusion_matrix
:计算标准化混淆矩阵得到:
^{pr2}$根据您的错误理由,准确度应为:
(注意在标准化矩阵中,行相加为100%,这在完全混淆矩阵中是不会发生的)
但现在让我们看看非标准化混淆矩阵的实际精确度是什么:
其中,根据精度定义为(TP+TN)/(TP+TN+FP+FN),我们得到:
当然,我们不需要混淆矩阵来获得准确度这样简单的东西;正如评论中已经建议的那样,我们可以简单地使用scikit learn的内置
accuracy_score
方法:毫不奇怪,这与我们从混淆矩阵直接计算的结果是一致的。在
底线:
accuracy_score
),那么最好使用它们而不是特别的灵感,尤其是当某些东西看起来不对劲时(比如Keras和scikit learn报告的准确度之间的差异)相关问题 更多 >
编程相关推荐