当属性是字符串（不是int或float）时，如何在Scikit learn中应用二进制分类器

#read file into numpy array format path = "/path/to/csv/file/BinaryClassification.csv" import numpy as np lstAttributes = np.loadtxt(path, delimiter=',')[:,0:2] lstLabels = np.loadtxt(path, delimiter=',')[:,2:3] tempArr = [] for v in lstLabels: tempArr.append(float(v)) from numpy import array lstLabels = array(tempArr) #trains and test algorithms (uses whole data as training and test set) from sklearn import naive_bayes classifier = naive_bayes.GaussianNB() model = classifier.fit(lstAttributes, lstLabels) prediction = model.predict(lstAttributes) from sklearn.metrics import confusion_matrix print confusion_matrix(lstLabels, prediction) #Use 5 fold cross validation to evaluate the algorithms from sklearn import cross_validation scores = cross_validation.cross_val_score(classifier, lstAttributes, lstLabels, cv=5, scoring='f1') print("cross validation: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

1条回答

网友

1楼 · 发布于 2024-09-30 20:27:58

一般来说，要将字符串转换为数字特征值，您必须知道字符串的含义，还必须考虑结果采用哪种学习算法。在这种情况下，最好先尝试一个热编码。^{}实现了这一点。结果将是一个稀疏的指示符变量矩阵，因此您最好从GaussianNB切换到{}（对于您当前的编码，GaussianNB并没有意义）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章