<p>我使用的是Sander等人1998 <a href="https://link.springer.com/article/10.1023/A:1009745219419" rel="nofollow noreferrer"/>to determine MinPts and epsilon to use DBSCAN on my dataset.
As Sanders et all suggests minpts=dim*2-1=k (in my case 9 dimensions --> minpts=k=17).
In the paper one should chose the "first valley". I can see two valleys but which one is the first one? And what value would you chose for epsilon?
<a href="https://i.stack.imgur.com/i2xU1.jpg" rel="nofollow noreferrer">kdistplot_with_duplicates</a></p>
<p>因为Sanders还建议,只有在没有复制品的情况下才应该使用这种方法,没有复制品的情况下才应该使用:(虽然我认为在这种情况下,这不重要)
<a href="https://i.stack.imgur.com/UhBVP.jpg" rel="nofollow noreferrer">kdistplot_without_duplicates</a>。
哪个山谷应该被认为是“第一”山谷</p>
<p>使用的代码:</p>
<pre><code>ns = 17
nbrs = NearestNeighbors(n_neighbors=ns, metric='euclidean').fit(data)
distances, indices = nbrs.kneighbors(data)
distanceDec = sorted(distances[:,ns-1], reverse=True)
plt.plot(list(range(1,683+1)), distanceDec)
</code></pre>