我使用决策树根据剩余列(0和1)的值预测输入文件的第一列(T或N)。我的输入文件的格式如下:
T,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
我想对我的预测进行拟合,并得到一个拟合值(y_predfit
),该值给出了预测的置信度,然后我想使用一个阈值(threshold)来决定我的预测是T还是N。如果y_predfit >threshold
,那么prediction=T
其他prediction=N
。
我使用了以下几行代码来获得y_predfit
,但是当我打印y_predfit
时,我得到的只是一组0,因此我没有得到我想要的拟合值,并且我不确定我是否使用了正确的代码行。我如何实现我想要的并获得合适的值(y_predfit
)
clf_gini.fit(X_test,y_test)
y_predfit = tree.DecisionTreeClassifier(X_test)
源代码
# Run this program on your local python
# interpreter, provided you have installed
# the required libraries.
# Importing the required packages
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
from sklearn import tree
import collections
import pydotplus
# Function importing Dataset
column_count =0
def importdata():
balance_data = pd.read_csv( 'data1extended.txt', sep= ',')
row_count, column_count = balance_data.shape
# Printing the dataswet shape
print ("Dataset Length: ", len(balance_data))
print ("Dataset Shape: ", balance_data.shape)
print("Number of columns ", column_count)
# Printing the dataset obseravtions
print ("Dataset: ",balance_data.head())
balance_data['gold'] = balance_data['gold'].astype('category').cat.codes
return balance_data, column_count
def columns(balance_data):
row_count, column_count = balance_data.shape
return column_count
# Function to split the dataset
def splitdataset(balance_data, column_count):
# Separating the target variable
X = balance_data.values[:, 1:column_count]
Y = balance_data.values[:, 0]
# Splitting the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size = 0.3, random_state = 100)
return X, Y, X_train, X_test, y_train, y_test
# Function to perform training with giniIndex.
def train_using_gini(X_train, X_test, y_train):
# Creating the classifier object
clf_gini = DecisionTreeClassifier(criterion = "gini",
random_state = 100,max_depth=3, min_samples_leaf=5)
# Performing training
clf_gini.fit(X_train, y_train)
return clf_gini
# Function to make predictions
def prediction(X_test, clf_object):
# Predicton on test with giniIndex
y_pred = clf_object.predict(X_test)
print("Predicted values:")
print(y_pred)
return y_pred
def main():
# Building Phase
data,column_count = importdata()
X, Y, X_train, X_test, y_train, y_test = splitdataset(data, column_count)
clf_gini = train_using_gini(X_train, X_test, y_train)
#tried to generate the fit value here and failed
clf_gini.fit(X_test,y_test)
y_predfit = tree.DecisionTreeClassifier(X_test)
print('FIT: ',y_predfit)
if __name__=="__main__":
main()
目前没有回答
相关问题 更多 >
编程相关推荐