我有如下训练和测试数据集:
train: 2000 files
hit.txt
nohit.txt
hit.txt
Test: 1500
hit.txt
nohit.txt
hit.txt
我训练了一个模型,准确率达到74%;代码如下:
但当我对测试数据集进行预测时,我得到的分数是一个我不想要的数组
test_dir = 'test/'
dictionary = make_dic(test_dir)
features_, labels_ = make_dataset(dictionary)
calibrated_pred_final = calibrated_clf_pipe.predict(features_)
calibrated_pred_final
array([1, 1, 1, ..., 1, 1, 0])
test_pred_final = calibrated_clf_pipe.predict_proba(features_)
import numpy as np
batch_y = np.array(test_pred_final).flatten()
f = open('scores.txt', 'w')
for i in range(len(batch_y)):
f.write(str(batch_y))
score.txt文件如下所示
[0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306] this goes forever.
我期望的是每个测试邮件的分数,如下所示
0.38636364
0.61363636
0.05147059
...
...
我不确定出了什么问题。为什么我在一个重复1500次的数组中得到这个分数?我假设这个分数数组代表每封邮件的分数,但是我如何删除整个列表,并且像我预期的结果一样,每封邮件只有一个分数
预测结果与预期一致。问题在于你写
score.txt
的方式。您没有索引到batch_y
,因此它正在为数组中的每个元素添加整个数组。以下是更新的代码:如果需要概率,可以继续使用
predict_proba
。如果您想要0 1预测,请使用predict
相关问题 更多 >
编程相关推荐