<p>我想用scikit learn lib在Python中创建MultiOutputClassifier。我想知道模型的特点和准确性。
我数据库中的所有数据都是分类的(字符串值)。
我知道为什么,但我总是得到这个错误:</p>
<p><code>ValueError: could not convert string to float: '<=50K'</code></p>
<p>错误在这行:
<code>model = cls.fit(features_train, result_train)</code></p>
<p>代码如下:</p>
<pre><code>import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn.multioutput import MultiOutputClassifier
df = pd.read_csv('income_education.csv')
#creating features and results for my model
features = df.iloc[:,-1]
results = df.iloc[:,:-1]
#spliting my data into train and test
features_train, features_test, result_train, result_test = train_test_split(features, results, test_size = 0.3, random_state = 42)
classifier = MultiOutputClassifier(tree.DecisionTreeClassifier())
#model fitting
cls = classifier
model = cls.fit(features_train, result_train)
pred = model.predict([cv.transform(['more'])])
print(pred)
# How to check accuracy of this classifier
</code></pre>