计算矩阵中一点与所有其他点之间的距离

3条回答

网友

1楼 · 编辑于 2024-09-27 19:23:23

这里有一种方法使用^{}-

from scipy.spatial.distance import cdist
def closest_rows(a):
    # Get euclidean distances as 2D array
    dists = cdist(a, a, 'sqeuclidean')

    # Fill diagonals with something greater than all elements as we intend
    # to get argmin indices later on and then index into input array with those
    # indices to get the closest rows
    dists.ravel()[::dists.shape[1]+1] = dists.max()+1
    return a[dists.argmin(1)]

样本运行-

In [72]: a
Out[72]: 
array([[1, 2, 8],
       [7, 4, 2],
       [9, 1, 7],
       [0, 1, 5],
       [6, 4, 3]])

In [73]: closest_rows(a)
Out[73]: 
array([[0, 1, 5],
       [6, 4, 3],
       [6, 4, 3],
       [1, 2, 8],
       [7, 4, 2]])

运行时测试

其他工作方法-

def norm_app(a): # @Psidom's soln
    dist = np.linalg.norm(a - a[:,None], axis=-1); 
    dist[np.arange(dist.shape[0]), np.arange(dist.shape[0])] = np.nan
    return a[np.nanargmin(dist, axis=0)]

带10,000点的计时-

In [79]: a = np.random.randint(0,9,(10000,3))

In [80]: %timeit norm_app(a) # @Psidom's soln
1 loop, best of 3: 3.83 s per loop

In [81]: %timeit closest_rows(a)
1 loop, best of 3: 392 ms per loop

进一步提升性能

有一个^{}包（免责声明：我是它的作者），其中包含各种计算欧几里德距离的方法，这些方法比SciPy's cdist更有效，特别是对于大型数组。

因此，利用它，我们会有一个更具表现力的，像这样-

from eucl_dist.cpu_dist import dist
def closest_rows_v2(a):
    dists = dist(a,a, matmul="gemm", method="ext") 
    dists.ravel()[::dists.shape[1]+1] = dists.max()+1
    return a[dists.argmin(1)]

时间安排-

In [162]: a = np.random.randint(0,9,(10000,3))

In [163]: %timeit closest_rows(a)
1 loop, best of 3: 394 ms per loop

In [164]: %timeit closest_rows_v2(a)
1 loop, best of 3: 229 ms per loop

网友

2楼 · 编辑于 2024-09-27 19:23:23

将np.linalg.norm与广播结合使用（numpy outer subtraction），可以执行以下操作：

np.linalg.norm(a - a[:,None], axis=-1)

a[:,None]在a中插入一个新轴，a - a[:,None]然后将由于广播而进行逐行减法。np.linalg.norm计算最后一个轴上的np.sqrt(np.sum(np.square(...)))：

a = np.array([[1,2,8],
     [7,4,2],
     [9,1,7],
     [0,1,5],
     [6,4,3]])

np.linalg.norm(a - a[:,None], axis=-1)
#array([[ 0.        ,  8.71779789,  8.1240384 ,  3.31662479,  7.34846923],
#       [ 8.71779789,  0.        ,  6.164414  ,  8.18535277,  1.41421356],
#       [ 8.1240384 ,  6.164414  ,  0.        ,  9.21954446,  5.83095189],
#       [ 3.31662479,  8.18535277,  9.21954446,  0.        ,  7.        ],
#       [ 7.34846923,  1.41421356,  5.83095189,  7.        ,  0.        ]])

例如，元素[0,1]，[0,2]对应于：

np.sqrt(np.sum((a[0] - a[1]) ** 2))
# 8.717797887081348

np.sqrt(np.sum((a[0] - a[2]) ** 2))
# 8.1240384046359608

分别是。

网友

3楼 · 编辑于 2024-09-27 19:23:23

我建议使用pdist和squareformfrom scipy.spatial.distance

考虑以下点数组：

a = np.array([[1,2,8], [7,4,2], [9,1,7], [0,1,5], [6,4,3]])

如果要显示点[1,2,8]与其他点之间的所有距离：

squareform(pdist(a))

Out[1]: array([[ 0.        ,  8.71779789,  8.1240384 ,  3.31662479,  7.34846923],
               [ 8.71779789,  0.        ,  6.164414  ,  8.18535277,  1.41421356],
               [ 8.1240384 ,  6.164414  ,  0.        ,  9.21954446,  5.83095189],
               [ 3.31662479,  8.18535277,  9.21954446,  0.        ,  7.        ],
               [ 7.34846923,  1.41421356,  5.83095189,  7.        ,  0.        ]])

如果要显示点[1,2,8]与最近点之间的最短距离：

sorted(squareform(pdist(a))[0])[1]

Out[2]: 3.3166247903553998

[0]是第一个点的索引（[1,2,8]）

[1]是第二个最小值的索引（以避免零）

如果要显示距[1,2,8]最近的点的索引：

np.argsort(squareform(pdist(a))[0])[1]

Out[3]: 3

相关问题更多 >

编程相关推荐

热门问题

热门文章