使用标准化数据pima数据的完美精度

col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label'] pima = pd.read_csv("pima dataset.txt",names = col_names) X = pima[col_names].as_matrix() y = pima.label.as_matrix() scaler = MinMaxScaler(feature_range=(0, 1)) rescaledX = scaler.fit_transform(X) # summarize transformed data np.set_printoptions(precision=3) #check transformations print(rescaledX[0:5,:]) X_train, X_test, y_train, y_test = train_test_split(rescaledX,y, test_size = 0.2, random_state =42) from sklearn.svm import SVC import random clf_1 = SVC(random_state = 42) #create a default model clf_1.fit(X_train, y_train) #fitting the model r_svc = [random.randrange(1,1000) for i in range(3)] #create a random seed for the 3 simulations. scores_matrix_clf_1 = [] for i in r_svc: kf = KFold(n_splits=10, shuffle = True, random_state = i) kf.get_n_splits(X) scores = cross_val_score(clf_1, X_train, y_train, cv=kf, n_jobs=-1, scoring = "accuracy") print(' SCORES FOR EACH RANDOM THREE SEEDS',i) print('-----------------------------SCORES----------------------------------------') print(scores, scores.mean()) scores_matrix_clf_1.append(scores)

SCORES FOR EACH RANDOM THREE SEEDS 617 -----------------------------SCORES---------------------------------------- [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 1.0 SCORES FOR EACH RANDOM THREE SEEDS 764 -----------------------------SCORES---------------------------------------- [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 1.0 SCORES FOR EACH RANDOM THREE SEEDS 395 -----------------------------SCORES---------------------------------------- [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 1.0

1条回答

网友

1楼 · 发布于 2024-05-04 21:05:25

您的X（输入数据集）包含试图预测的label列。这被称为data leakage，几乎总是导致100%的准确率，因为您在一列（特征）中给出了您想要预测的答案。你知道吗

示例：

假设您有一个包含以下特性的数据集：

人的身高
人体重量
人足尺寸

你想预测sex。你知道吗

因此，如果您将height、weight、foot size和sex作为输入数据集和sex（再次）作为输出向量输入到您的模型中，它会发现最后一个特征sex的系数（权重）最高，因为它总是“预测”正确的性别。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章