Python:点聚类/平均

2024-06-25 06:41:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个探测器,它返回检测到的对象的边界框中心,它的大部分工作正常。但是,我想做的是考虑10帧而不是1帧来进行检测,这样可以消除更多的误报。你知道吗

我的探测器正常工作方式如下:

1. Get a frame.
2. Conduct the algorithm. 
3. Record the centers into a dictionary per each frame. 

我认为有助于减少误报的方法是:

1. Set up a loop of 10: 
   1. Get a frame.
   2. Conduct the algorithm. 
   3. Record the centers into a dictionary per each frame.
2. Loop over the recorded points after every 10 frames.
3. Use a clustering algorithm or simple distance averaging
4. Get the final centers.

所以,我已经实现了一些逻辑。我在第1.3步,我需要找到一种方法来组坐标和完成估计。你知道吗

在10帧之后,我的字典会保存这样的值(不能全部粘贴):

      (4067.0, 527.0): ['torx8', 'screw8'], 
      (4053.0, 527.0): ['torx8', 'screw1'], 
      (2627.0, 707.0): ['torx8', 'screw12'], 
      (3453.0, 840.0): ['torx6', 'screw14'], 
      (3633.0, 1373.0): ['torx6', 'screw15'], 
      (3440.0, 840.0): ['torx6', 'screw14'], 
      (3447.0, 840.0): ['torx6', 'screw14'], 
      (1660.0, 1707.0): ['torx8', 'screw3'], 
      (2633.0, 700.0): ['torx8', 'screw7'], 
      (2627.0, 693.0): ['torx8', 'screw8'], 
      (4060.0, 533.0): ['torx8', 'screw6'], 
      (3627.0, 1367.0): ['torx6', 'screw13'], 
      (2600.0, 680.0): ['torx8', 'screw15'], 
      (2607.0, 680.0): ['torx8', 'screw7']

正如你所注意到的,这些点中的大多数已经是相同的点,有点像素偏移,这就是为什么我试图找到一种方法来消除所谓的重复。你知道吗

有没有一种聪明而有效的方法来处理这个问题?我首先想到的是k均值聚类,但我不确定这是否适合这个问题。你知道吗

有没有人有过类似的经历?你知道吗

编辑:好的,所以我取得了一些进展,我能够使用层次聚类对点进行聚类,因为在我的情况下,我没有关于聚类数目的先验知识。因此,需要近似值。你知道吗

# cluster now
points = StandardScaler().fit_transform(points)   
db = self.dbscan.fit(points)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(db.labels_)) - (1 if -1 in db.labels_ else 0)
n_noise_ = list(db.labels_).count(-1)

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
        for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 1]

    class_member_mask = (labels == k)

    xy = points[class_member_mask & core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
            markeredgecolor='k', markersize=14)

    xy = points[class_member_mask & ~core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
            markeredgecolor='k', markersize=6)

plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

Clustering Result

效果很好。我能够消除假阳性(见黑点),但是,我仍然不知道如何才能得到每个簇的平均值。比如,在我找到簇之后,我如何循环遍历每个簇并平均所有的X,Y值?(显然,在StandardScaler().fit_transform(points)之前,因为在那之后我丢失了像素坐标,所以它们适合于-1和-1之间。)


Tags: the方法coredblabelspltmask聚类
1条回答
网友
1楼 · 发布于 2024-06-25 06:41:44

好吧,我终于明白了。因为我也需要我的点在他们原来的规模(不是-1和1之间),我也不得不做缩放。不管怎样,这里有全部的魔力:

def cluster_dbscan(self, points, visualize=False):

        # scale the points between -1 and 1
        scaler = StandardScaler()
        scaled_points = scaler.fit_transform(points)

        # cluster
        db = DBSCAN(eps=self.clustering_epsilon, min_samples=self.clustering_min_samples, metric='euclidean')
        db.fit(scaled_points)
        core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
        core_samples_mask[db.core_sample_indices_] = True

        # Number of clusters in labels, ignoring noise if present.
        n_clusters_ = len(set(db.labels_)) - (1 if -1 in db.labels_ else 0)
        n_noise_ = list(db.labels_).count(-1)

        if (visualize == True):
            # Black removed and is used for noise instead.
            unique_labels = set(db.labels_)
            colors = [plt.cm.Spectral(each)
                    for each in np.linspace(0, 1, len(unique_labels))]
            for k, col in zip(unique_labels, colors):
                if k == -1:
                    # Black used for noise.
                    col = [0, 0, 0, 1]

                class_member_mask = (db.labels_ == k)

                xy = scaled_points[class_member_mask & core_samples_mask]
                plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
                        markeredgecolor='k', markersize=14)

                xy = scaled_points[class_member_mask & ~core_samples_mask]
                plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
                        markeredgecolor='k', markersize=6)

            plt.title('Estimated number of clusters: %d' % n_clusters_)
            plt.show()

        # back to original scale
        points = scaler.inverse_transform(scaled_points) 

        # loop over the clusters, get the centers
        centers = np.zeros((n_clusters_, 2)) # for x and y
        for i in range(0, n_clusters_):
            cluster_points = points[db.labels_ == i]
            cluster_mean = np.mean(cluster_points, axis=0)
            centers[i, :] = cluster_mean

        # we need the original points
        return centers

相关问题 更多 >