我对Python2.7非常陌生,我尝试在我的数据集上运行决策树分类器,但是在一个教程之后,我遇到了这个问题,我首先将我的features列矢量化并将其保存到数组中,然后使用labelencoder将目标列保存在数组中。请你给我解释一下如何修正这个错误?在
数据:
代码:
import pandas as pd
dataset = "C:/Users/ashik swaroop/Desktop/anaconda/Gene Dataset/Final.csv"
datacan = pd.read_csv(dataset)
datacan = datacan.fillna('')
features = datacan[[
"Tumour_Types_Somatic","Tumour_Types_Germline",
"Cancer_Syndrome","Tissue_Type",
"Role_in_Cancer","Mutation_Types","Translocation_Partner",
"Other_Syndrome","Tier","Somatic","Germline",
"Molecular_Genetics","Other_Germline_Mut"]]
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import LabelEncoder
X_dict = features.to_dict().values()
vect = DictVectorizer(sparse=False)
X_vector = vect.fit_transform(X_dict)
le = LabelEncoder()
y_train = le.fit_transform(datacan['Gene_Symbol'][:-1])
X_Train = X_vector[:-1]
X_Test = X_vector[-1:]
from sklearn import tree
clf = tree.DecisionTreeClassifier(criterion='entropy')
clf = clf.fit(X_Train,y_train) `
我得到了这个错误:
^{pr2}$
首先,要理解错误: 似乎您的训练样本数(即
np.shape(X_train)[0]
)与标签数目(即np.shape(y_train)[0]
)不匹配。在在查看代码时,我注意到一些不一致之处。请参考下面的在线评论。在
相关问题 更多 >
编程相关推荐