<p>例如,您可以使用scikit learn(<code>sklearn</code>)中的<a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html" rel="nofollow noreferrer">^{<cd1>}</a>。线性分类器计算预测如下(参见<a href="https://github.com/scikit-learn/scikit-learn/blob/5a74e2f1c8470c527018dba78f86557f40eaeb47/sklearn/linear_model/base.py#L272" rel="nofollow noreferrer">the source code</a>):</p>
<pre><code>def predict(self, X):
scores = self.decision_function(X)
if len(scores.shape) == 1:
indices = (scores > 0).astype(np.int)
else:
indices = scores.argmax(axis=1)
return self.classes_[indices]
</code></pre>
<p>其中<a href="https://github.com/scikit-learn/scikit-learn/blob/5a74e2f1c8470c527018dba78f86557f40eaeb47/sklearn/linear_model/base.py#L278" rel="nofollow noreferrer">^{<cd3>}</a>由下式给出:</p>
<pre><code>def decision_function(self, X):
[...]
scores = safe_sparse_dot(X, self.coef_.T,
dense_output=True) + self.intercept_
return scores.ravel() if scores.shape[1] == 1 else scores
</code></pre>
<p>因此对于您的示例的二维情况,这意味着数据点被分类<code>+1</code>,如果</p>
<pre><code>x*w1 + y*w2 + i > 0
</code></pre>
<p>在哪里</p>
<pre><code>x, y = X
w1, w2 = self.coef_
i = self.intercept_
</code></pre>
<p>否则<code>-1</code>。因此决定取决于<code>x*w1 + y*w2 + i</code>大于或小于(或等于)零。因此,通过设置<code>x*w1 + y*w2 + i == 0</code>可以找到“border”。我们可以自由选择其中一个组成部分,另一个由这个方程决定。你知道吗</p>
<p>下面的代码片段匹配<code>SGDClassifier</code>,并绘制结果“border”。它假设数据点分散在原点周围(<code>x, y = 0, 0</code>),即它们的平均值(大约)为零。实际上,为了得到好的结果,我们应该先从数据点中减去平均值,然后进行拟合,然后再将平均值加到结果中。下面的代码片段只是散布原点周围的点。你知道吗</p>
<pre><code>import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import SGDClassifier
n = 100
x = np.random.uniform(-1, 1, size=(n, 2))
# We assume points are scatter around zero.
b = np.zeros(2)
d = np.random.uniform(-1, 1, size=2)
slope, intercept = (d[1] / d[0]), 0.
fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(x[:, 0], x[:, 1], color = 'black')
ax.plot([b[0], d[0]], [b[1], d[1]], 'b-', label='Ideal')
labels = []
for point in x:
if(point[1] > (slope * point[0] + intercept)):
ax.annotate('+', xy=point, xytext=(0, -10), textcoords='offset points', color = 'blue', ha='center', va='center')
labels.append(1)
else:
ax.annotate(' ', xy=point, xytext=(0, -10), textcoords='offset points', color = 'red', ha='center', va='center')
labels.append(-1)
labels = np.array(labels)
classifier = SGDClassifier()
classifier.fit(x, labels)
x1 = np.random.uniform(-1, 1)
x2 = (-classifier.intercept_ - x1 * classifier.coef_[0, 0]) / classifier.coef_[0, 1]
ax.plot([0, x1], [0, x2], 'g ', label='Fit')
plt.legend()
plt.show()
</code></pre>
<p>此图显示<code>n = 100</code>数据点的结果:</p>
<p><a href="https://i.stack.imgur.com/GeWvc.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/GeWvc.png" alt="Result for n=100"/></a></p>
<p>下图显示了从包含1000个数据点的池中随机选择点的不同<code>n</code>的结果:</p>
<p><a href="https://i.stack.imgur.com/bN5ID.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/bN5ID.png" alt="Results for different n"/></a></p>