用于在随机洗牌训练目标数据之后查找特征的ROC得分的函数不是随机的

2024-10-03 23:20:22 发布

您现在位置：Python中文网/ 问答频道 /正文

8965

网友

男 | 程序猿一只，喜欢编程写python代码。

我试着写一个函数，它将给出10个logistic回归分类器的平均ROC分数，每个分类器每次对一个特征的训练目标数据进行不同的随机洗牌(为了与非随机的roc评分进行比较），但是我得到了每个roc评分非常奇怪和非随机的结果

我尝试使用np.random.shuffle代替pd.sample，得到了相同的结果

from sklearn import metrics
from sklearn.linear_model import LogisticRegression

def shuffled_roc(df, feature):
    df = df.sample(frac=1, random_state=0)
    x = df[feature][np.isfinite(df[feature])].copy()
    y = df['target'][np.isfinite(df[feature])].copy()

    x_train = x.iloc[:int(0.8*len(x))]
    y_train = y.iloc[:int(0.8*len(x))]

    x_test = x.iloc[int(0.8*len(x)):]
    y_test = y.iloc[int(0.8*len(x)):]

    y_train_shuffled = y_train.sample(frac=1).reset_index(drop=True)

    rocs = []
    for i in range(10):
        y_train_shuffled = y_train_shuffled.sample(frac=1).reset_index(drop=True)
        lr = LogisticRegression(solver = 'lbfgs').fit(x_train.values.reshape(-1,1), y_train_shuffled)

        roc = metrics.roc_auc_score(y_test, lr.predict_proba(x_test.values.reshape(-1,1))[:,1])
        rocs.append(roc)
    print(rocs)
    return np.mean(rocs)
shuffled_roc(df_accident, 'target_suspension_count')

我期望10个roc分数有10个不同的值，但是我得到的是

[0.7572317596566523, 0.24276824034334765, 0.24276824034334765, 0.7572317596566523, 0.7572317596566523, 0.7572317596566523, 0.24276824034334765, 0.7572317596566523, 0.7572317596566523, 0.24276824034334765]

Tags： sample test df len 分类器 np train 分数

0条回答

目前没有回答

用于在随机洗牌训练目标数据之后查找特征的ROC得分的函数不是随机的

相关问题更多 >

编程相关推荐

热门问题

热门文章

用于在随机洗牌训练目标数据之后查找特征的ROC得分的函数不是随机的

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >