如何使用决策树获得拟合值?

2024-09-30 10:42:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用决策树根据剩余列(0和1)的值预测输入文件的第一列(T或N)。我的输入文件的格式如下:

T,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0    

我想对我的预测进行拟合,并得到一个拟合值(y_predfit),该值给出了预测的置信度,然后我想使用一个阈值(threshold)来决定我的预测是T还是N。如果y_predfit >threshold,那么prediction=T其他prediction=N。 我使用了以下几行代码来获得y_predfit,但是当我打印y_predfit时,我得到的只是一组0,因此我没有得到我想要的拟合值,并且我不确定我是否使用了正确的代码行。我如何实现我想要的并获得合适的值(y_predfit

clf_gini.fit(X_test,y_test) 
y_predfit = tree.DecisionTreeClassifier(X_test)

源代码

    # Run this program on your local python 
# interpreter, provided you have installed 
# the required libraries. 

# Importing the required packages 
import numpy as np 
import pandas as pd 
from sklearn.metrics import confusion_matrix 
from sklearn.cross_validation import train_test_split 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.externals.six import StringIO  
from IPython.display import Image  
from sklearn.tree import export_graphviz
from sklearn import tree
import collections

import pydotplus
# Function importing Dataset 
column_count =0 
def importdata(): 
    balance_data = pd.read_csv( 'data1extended.txt', sep= ',') 
    row_count, column_count = balance_data.shape

    # Printing the dataswet shape 
    print ("Dataset Length: ", len(balance_data)) 
    print ("Dataset Shape: ", balance_data.shape) 
    print("Number of columns ", column_count)

    # Printing the dataset obseravtions 
    print ("Dataset: ",balance_data.head()) 
    balance_data['gold'] = balance_data['gold'].astype('category').cat.codes

    return balance_data, column_count 
def columns(balance_data): 
    row_count, column_count = balance_data.shape
    return column_count
# Function to split the dataset 
def splitdataset(balance_data, column_count): 

    # Separating the target variable 
    X = balance_data.values[:, 1:column_count] 
    Y = balance_data.values[:, 0] 

    # Splitting the dataset into train and test 
    X_train, X_test, y_train, y_test = train_test_split( 
    X, Y, test_size = 0.3, random_state = 100) 

    return X, Y, X_train, X_test, y_train, y_test 

# Function to perform training with giniIndex. 
def train_using_gini(X_train, X_test, y_train): 

    # Creating the classifier object 
    clf_gini = DecisionTreeClassifier(criterion = "gini", 
            random_state = 100,max_depth=3, min_samples_leaf=5) 

    # Performing training 
    clf_gini.fit(X_train, y_train) 
    return clf_gini 


# Function to make predictions 
def prediction(X_test, clf_object): 

    # Predicton on test with giniIndex 
    y_pred = clf_object.predict(X_test) 



    print("Predicted values:") 
    print(y_pred) 
    return y_pred 


def main(): 

    # Building Phase 
    data,column_count = importdata() 
    X, Y, X_train, X_test, y_train, y_test = splitdataset(data, column_count) 
    clf_gini = train_using_gini(X_train, X_test, y_train) 


    #tried to generate the fit value here and failed 
    clf_gini.fit(X_test,y_test) 
    y_predfit = tree.DecisionTreeClassifier(X_test)

    print('FIT:   ',y_predfit)




if __name__=="__main__": 
    main() 

Tags: thefromtestimportdatadefcountcolumn

热门问题