当我使用正常精度和K倍交叉验证运行决策树算法时,得到了两个完全不同的结果

2024-09-26 18:10:46 发布

您现在位置:Python中文网/ 问答频道 /正文

问题是,当我运行DTC算法时,我得到了两个完全不同的结果,我只是想确保以正确的方式编写交叉验证-K折叠,或者理解为什么K折叠的结果比正常的结果小得多

我已尝试运行代码,以获得正常精度和K倍精度的结果。代码如下:

from scipy.signal import butter, lfilter

import numpy as np
import pandas as pd
import pandas
from sklearn import preprocessing
from scipy.fftpack import fft
import pickle
import numpy
from pandas import Series
from numpy.random import randn
import pandas as pd
import numpy as np
from pandas import DataFrame
from sklearn.metrics import accuracy_score

from sklearn.tree import DecisionTreeClassifier
xx = pandas.read_csv("data1.dat", delimiter=",")
y = pandas.read_csv("label.dat", delim_whitespace=True)

x = xx.as_matrix()
y = numpy.array(y).astype(numpy.int)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)



clf2 = DecisionTreeClassifier(random_state=42)
clf2.fit(X_train, y_train)
y_predict_2 = clf2.predict(X_test)
print("DTC Accuracy : ")
print(accuracy_score(y_test, y_predict_2)*100)

故障诊断码准确度: 97.63020833333

from sklearn.model_selection import cross_val_score
DTC = DecisionTreeClassifier(random_state=42)
scores =cross_val_score(DTC, x, y, cv=10, scoring='accuracy')
print(scores.mean()*100)

35.331452470904985

from sklearn.model_selection import cross_val_score
DTC = DecisionTreeClassifier(random_state=42)
scores =cross_val_score(DTC, X_train, y_train, cv=10, scoring='accuracy')
print(scores.mean()*100)

97.34356

然而,在交叉验证部分,当我将X_列替换为X,将y_列替换为y时,精度再次提高到97。 我想知道我需要使用哪一个(x和y)或(x_-train和n-y_-train)将是正确的和常识性的交叉验证


Tags: fromtestimportnumpypandasastrainrandom
1条回答
网友
1楼 · 发布于 2024-09-26 18:10:46

尝试洗牌数据,减少交叉验证次数

import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.utils import shuffle
from sklearn.tree import DecisionTreeClassifier

xx = pandas.read_csv("data1.dat", delimiter=",")
y = pandas.read_csv("label.dat", delim_whitespace=True)

x = xx.as_matrix()
y = y.values.astype(np.int32).reshape(-1, 1)

x, y = shuffle(x, y, random_state=42)

DTC = DecisionTreeClassifier(random_state=42)
scores = cross_val_score(DTC, x, y, cv=3, scoring='accuracy')
print(scores.mean()*100)

相关问题 更多 >

    热门问题