如何高效地计算多时间序列的欧氏距离矩阵

import numpy as np series = np.array([ [0., 0, 1, 2, 1, 0, 1, 0, 0], [0., 1, 2, 0, 0, 0, 0, 0, 0], [1., 2, 0, 0, 0, 0, 0, 1, 1], [0., 0, 1, 2, 1, 0, 1, 0, 0], [0., 1, 2, 0, 0, 0, 0, 0, 0], [1., 2, 0, 0, 0, 0, 0, 1, 1]])

3条回答

网友

1楼 · 编辑于 2024-09-29 21:33:44

你可以在一行中用简单的numpy创建一个距离矩阵，你不需要其他任何东西

np.sqrt(((series[:,None,:] - series)**2).sum(axis=2))

网友

2楼 · 编辑于 2024-09-29 21:33:44

还可以使用^{}获取距离矩阵：

from scipy.spatial.distance import pdist, squareform
squareform(pdist(series))

与pure numpy和euclidean_distances解决方案的性能比较：

因此，对于相对较小的数据集（最多20个系列，每个系列有200个元素）pdist是最快的，对于较大的数据集euclidean_disances的性能要好得多pure numpy通常速度较慢，可能无法为大型数据集分配中间数组。
使用np.random.randint(0, 100, (n, 10*n)).astype('int16')、numpy 1.17.4、scipy 1.4.1、sklearn 0.23.1、python 3.8.2、Win10 64位进行测试

网友

3楼 · 编辑于 2024-09-29 21:33:44

您根本不需要循环，因为两个数组之间的欧几里德距离只需计算差分的元素平方，如下所示：

def euclidean_distance(v1, v2):
    return np.sqrt(np.sum((v1 - v2)**2))

对于距离矩阵，有^{}：

from sklearn.metrics.pairwise import euclidean_distances

euclidean_distances(a).round(2)

array([[0.  , 2.83, 3.74, 0.  , 2.83, 3.74],
       [2.83, 0.  , 2.83, 2.83, 0.  , 2.83],
       [3.74, 2.83, 0.  , 3.74, 2.83, 0.  ],
       [0.  , 2.83, 3.74, 0.  , 2.83, 3.74],
       [2.83, 0.  , 2.83, 2.83, 0.  , 2.83],
       [3.74, 2.83, 0.  , 3.74, 2.83, 0.  ]])

np.allclose(
    eudis(series[2], series[3]),
    euclidean_distance(series[2], series[3])
)
# True

相关问题更多 >

编程相关推荐

热门问题

热门文章