在多维空间中高效地寻找邻域，并基于邻近度计算值之和

3条回答

网友

1楼 · 编辑于 2024-10-06 12:12:39

这里的解决方案不需要额外的软件包。你知道吗

它们是定义两点a和b之间距离的函数。这里显示了欧几里德距离、曼哈顿距离和切比雪夫距离（归功于@Peter Leimbigler answer，他认识到最后一个距离是OP使用的距离）。a和b被假定为3长度的列表。您可以使用其中之一（甚至可以定义其他自定义的距离函数）。你知道吗

def euclidean(a, b):
    """euclidean distance"""
    return np.sqrt((a[0] - b[0])**2 + (a[1] - b[1])**2 + (a[2] - b[2])**2) 

def manhattan(a, b):
    """manhattan distance"""
    return abs(a[0] - b[0]) + abs(a[1] - b[1]) + abs(a[2] - b[2])

def cebyshev(a, b):
    """cebyshev distance"""
    return max(abs(a[0] - b[0]), abs(a[1] - b[1]), abs(a[2] - b[2]))

下面的函数为点point返回dataframe data（这是您的dataframe）中val列的值之和，该列的坐标比距离d更近。func是用于计算距离的函数（以前的函数之一）。你知道吗

def getclosesum(data, point, d, func):
    dists = data.apply(lambda x : func(x, point), axis=1)
    return data['val'].loc[dists <= d].sum()

最后，您可以使用df.apply计算列：

for n in range(3):
    df['n{0}'.format(n)] = df.apply(lambda x : getclosesum(df, x, n, cebyshev), axis=1)

使用您的示例数据帧，在我的机器上，此代码需要0.155秒才能完成任务，而您的原始代码需要0.233秒。
所以这比您的解决方案快，但没有@Peter Leimbigler提供的代码快（我打赌scikit更优化）。你知道吗

网友

2楼 · 编辑于 2024-10-06 12:12:39

此解决方案还使用KDTrees（来自scipy库）。你知道吗

在您的代码和前面的答案中，当循环计算radius=3的结果时，它将重复radius=0，1，和2。你知道吗

下面的代码一次通过节点就完成了所有的计算。定义一个最大距离和一个范围箱数。找到具有最大距离的所有节点对，并使用np.digitize()将实际距离映射到范围bin。将“val”添加到映射的范围bin。你知道吗

import pandas as pd
import numpy as np

from scipy.spatial import cKDTree as KDTree

# define the range and number of range bins 
# this example defines 3 bins: 0.0 - 1.0; 1.0 - 2.0; 2.0 - 3.0
max_distance = 3.0
nbins = 3
bin_range = 0.0, max_distance
bins = np.linspace(*bin_range, nbins+1)[1:]

# build a KDTree and generate a sparse matrix of node pairs
# that have a max distance of bin_range[-1]
tree = KDTree(df[['x','y','z']])
dist = tree.sparse_distance_matrix(tree, bin_range[-1])

# one row per node, one column per range bin
sums = np.zeros((len(df), nbins))

# for each pair of nodes, map the range to the bin index and add
# the value of the second node to mapped bin for the 1st node 
for (j,k),d in dist.items():
    sums[j][np.digitize(d, bins)] += df['val'][k+1]

对于每个节点，数组sums都包含一行，其中包含了装箱范围的和。例如，第一列包含距离为<；1的节点的VAL之和，第二列包含距离在1和2之间的节点的VAL，第三列包含距离在2和3之间的节点的VAL。您可以跨列累加以获得与表相同的结果。你知道吗

sums

array([[ 0.,  1., 21.],
       [ 0.,  0., 25.],
       [ 0.,  6., 11.],
       [ 1., 10., 43.],
       [ 0., 19., 51.],
       [ 0., 17., 40.],
       [ 6.,  0., 25.],
       [ 3., 22., 49.],
       [ 7., 47., 45.],
       [11., 35., 65.],
       [ 0., 31., 42.],
       [ 0., 10., 23.],
       [14., 48., 37.],
       [18., 77., 10.],
       [10., 50., 47.],
       [ 4., 12., 50.],
       [20., 47., 33.],
       [15., 50., 36.],
       [ 2., 29., 49.]])

网友

3楼 · 编辑于 2024-10-06 12:12:39

在k维空间中寻找最近邻是k-d树数据结构（Wikipedia）的一个经典例子。sciketlearn有一个灵活的实现（docs），我在下面使用它，因为您的问题中使用的条件逻辑似乎定义了Chebyshev距离度量（Wikipedia），sciketlearn本机支持这个度量。SciPy的cKDTree（docs，C++ source code）只支持欧几里德（L2）距离度量，但是针对它进行了优化，因此可能更快。你知道吗

# Setup
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19], 
                   'x':[-2,-2,-2,-1,-1,-1,-1,0,0,0,0,0,1,1,1,1,2,2,2], 
                   'y':[2,1,0,2,1,0,-1,2,1,0,-1,-2,1,0,-1,-2,0,-1,-2], 
                   'z':[0,1,2,-1,0,1,2,-2,-1,0,1,2,-2,-1,0,1,-2,-1,0], 
                   'val':[0,0,0,1,0,0,6,3,7,11,0,0,14,18,10,4,20,15,2]})
df.set_index('id', inplace=True)


from sklearn.neighbors import KDTree

# Build k-d tree with the Chebyshev metric, AKA L-infinity
tree = KDTree(df[['x', 'y', 'z']].values, metric='chebyshev')

for radius in [0, 1, 2]:
    # Populate new column with placeholder integer
    df[f'n{radius}'] = -1
    for i, row in df.iterrows():
        coords = row[['x', 'y', 'z']].values.reshape(1, -1)
        idx = tree.query_radius(coords, r=radius)[0]
        df.loc[i, f'n{radius}'] = df.iloc[idx]['val'].sum()

df
    x  y  z  val  n0  n1   n2
id                           
1  -2  2  0    0   0   1   22
2  -2  1  1    0   0   0   25
3  -2  0  2    0   0   6   17
4  -1  2 -1    1   1  11   54
5  -1  1  0    0   0  19   70
6  -1  0  1    0   0  17   57
7  -1 -1  2    6   6   6   31
8   0  2 -2    3   3  25   74
9   0  1 -1    7   7  54   99
10  0  0  0   11  11  46  111
11  0 -1  1    0   0  31   73
12  0 -2  2    0   0  10   33
13  1  1 -2   14  14  62   99
14  1  0 -1   18  18  95  105
15  1 -1  0   10  10  60  107
16  1 -2  1    4   4  16   66
17  2  0 -2   20  20  67  100
18  2 -1 -1   15  15  65  101
19  2 -2  0    2   2  31   80

相关问题更多 >

编程相关推荐

热门问题

热门文章