擅长:python、mysql、java
<p><strong>将输入传递给分类器时,传递2D数组</strong>(属于形状<code>(M, N)</code>,其中N>;=1)<strong>,而不是1D数组</strong>(具有形状<code>(N,)</code>)。错误信息很清楚</p>
<blockquote>
<p>Reshape your data either using <code>array.reshape(-1, 1)</code> if your data has a
single feature or <code>array.reshape(1, -1)</code> if it contains a single sample.</p>
</blockquote>
<pre><code>from sklearn.model_selection import train_test_split
# X.shape should be (N, M) where M >= 1
X = mydata[['script']]
# y.shape should be (N, 1)
y = mydata['label']
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
...
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
</code></pre>
<p/>
<p>其他一些有用的提示-</p>
<ol>
<li>将数据分成有效的训练和测试部分。不要使用你的训练数据来测试-这会导致对分类器强度的不准确估计</li>
<li>我建议你把你的标签分解,所以你要处理整数。只是比较容易。</li>
</ol>