输出与预测中的预期不符

2024-09-29 11:31:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有如下训练和测试数据集:

train: 2000 files
hit.txt
nohit.txt
hit.txt

Test: 1500 
hit.txt
nohit.txt
hit.txt

我训练了一个模型,准确率达到74%;代码如下:

但当我对测试数据集进行预测时,我得到的分数是一个我不想要的数组

test_dir = 'test/'
dictionary = make_dic(test_dir)

features_, labels_ = make_dataset(dictionary)

calibrated_pred_final = calibrated_clf_pipe.predict(features_)

calibrated_pred_final
array([1, 1, 1, ..., 1, 1, 0])

test_pred_final = calibrated_clf_pipe.predict_proba(features_)
import numpy as np
batch_y = np.array(test_pred_final).flatten()

f = open('scores.txt', 'w')
for i in range(len(batch_y)):
    f.write(str(batch_y))    

score.txt文件如下所示

[0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306][0.38636364 0.61363636 0.05147059 ... 0.61363636 0.86734694 0.13265306] this goes forever.

我期望的是每个测试邮件的分数,如下所示

 0.38636364     
 0.61363636 
 0.05147059
    ...
    ...

我不确定出了什么问题。为什么我在一个重复1500次的数组中得到这个分数?我假设这个分数数组代表每封邮件的分数,但是我如何删除整个列表,并且像我预期的结果一样,每封邮件只有一个分数


Tags: testtxtdictionarydirbatch邮件数组分数
1条回答
网友
1楼 · 发布于 2024-09-29 11:31:33

预测结果与预期一致。问题在于你写score.txt的方式。您没有索引到batch_y,因此它正在为数组中的每个元素添加整个数组。以下是更新的代码:

test_dir = 'test/'
dictionary = make_dic(test_dir)

features_, labels_ = make_dataset(dictionary)

calibrated_pred_final = calibrated_clf_pipe.predict(features_)

calibrated_pred_final
array([1, 1, 1, ..., 1, 1, 0])

# line changed below.
test_pred_final = calibrated_clf_pipe.predict_proba(features_)

import numpy as np
batch_y = np.array(test_pred_final).flatten()

f = open('scores.txt', 'w')
for i in range(len(batch_y)):
    f.write(str(batch_y[i]) + '\n') # You missed indexing into batch_y here

如果需要概率,可以继续使用predict_proba。如果您想要0 1预测,请使用predict

相关问题 更多 >