Python:寻找连接多个点(x,y)到已知函数y(x)上最近点的正交向量的有效方法

2024-09-27 09:26:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据集,包括一个长数组的x值和一个同样长的y值数组。对于每个(x,y)对,我想找到已知函数y(x)上最近的点。在

原则上,我可以在每一对上循环并执行一个最小化,比如scipy.optimize.cobyla公司,但python中的循环速度很慢。Scipy的odr包看起来很有趣,但是我不知道如何让它简单地返回正交向量而不最小化整个过程(将最大迭代次数“maxit”设置为零并不能满足我的需要)。在

有没有一种简单的方法可以使用numpy数组的速度来完成这个任务?在


Tags: 数据方法函数过程公司scipy数组次数
2条回答

有一种方法可以加速Hennadii-Madan的方法,让numpy来做循环,而不是python。与往常一样,这是以额外的RAM为代价的。在

下面是我现在使用的二维函数。一个很好的特点是它是对称的,可以交换数据集,计算时间也一样。在

def find_nearests_2d(x1, y1, x2, y2):
   """
   Given two data sets d1 = (x1, y1) and d2 = (x2, y2), return the x,y pairs
   from d2 that are closest to each pair from x1, the difference vectors, and
   the d2 indices of these closest points. 

   Parameters
        
   x1
       1D array of x-values for data set 1.
   y1  
       1D array of y-values for data set 1 (must match size of x1).
   x2
       1D array of x-values for data set 2.
   y2
       1D array of y-values for data set 2 (must match size of x2).

   Returns x2mins, y2mins, xdiffs, ydiffs, indices
      -
   x2mins
       1D array of minimum-distance x-values from data set 2. One value for each x1.
   y2mins
       1D array of minimum-distance y-values from data set 2. One value for each y1.
   xdiffs 
       1D array of differences in x. One value for each x1.
   ydiffs
       1D array of differences in y. One value for each y1.
   indices
       Indices of each minimum-distance point in data set 2. One for each point in
       data set 1.
   """

   # Generate every combination of points for subtracting
   x1s, x2s = _n.meshgrid(x1, x2)
   y1s, y2s = _n.meshgrid(y1, y2)

   # Calculate all the differences
   dx = x1s - x2s
   dy = y1s - y2s
   d2 = dx**2 + dy**2

   # Find the index of the minimum for each data point
   n = _n.argmin(d2, 0)

   # Index for extracting from the meshgrids
   m = range(len(n))

   return x2s[n,m], y2s[n,m], dx[n,m], dy[n,m], d2[n,m], n

还可以使用此方法快速估计x、y对与函数之间的距离:

^{pr2}$

如果这是正交距离回归的一部分(就像我的例子中一样),dx和dy的差异可以很容易地用误差条数据集进行缩放,而不需要太多开销,这样返回的距离就是学习化(无单位)残差。在

最终,这种“在任何地方都统一搜索”的技术只会让您接近,并且如果函数在x数据范围内不是特别平滑,则会失败。在

快速测试代码:

x  = [1,2,5]
y  = [1,-1,1]

def f(x): return _n.cos(x)

fxmin, fymin, dxmin, dymin, d2min, n, xf, yf = find_nearests_function(x, y, f)

import pylab
pylab.plot(x,y, marker='o', ls='', color='m', label='input points')
pylab.plot(xf,yf, color='b', label='function')
pylab.plot(fxmin,fymin, marker='o', ls='', color='r', label='nearest points')
pylab.legend()
pylab.show()

生产

enter image description here

答案很简单:

  1. 不要循环列表中的点
  2. 在你的 函数曲线。在

为了避免混淆,我冒昧地将函数y(x)重命名为f(z)。在

import numpy as np

# x and y are your numpy arrays of point coords
x = np.array([1,2])
y = np.array([3,4])
# this is your "y(x)" function
def f(z):
    return z**2

xmin = x.min()
xmax = x.max()
step = 0.01 # choose your step at the precision you want

# find distances to every point
zpoints = np.arange(xmin,xmax,step)
distances_squared = np.array([(y-f(z))**2+(x-z)**2 for z in zpoints])

# find z coords of closest points
zmin = zpoints[distances_squared.argmin(axis=0)]
fmin = np.array([f(z) for z in zmin])

for i in range(len(x)):
    print("point on the curve {},{} is closest to {},{}".format(zmin[i],fmin[i],x[i],y[i]))

point on the curve 1.6700000000000006,2.788900000000002 is closest to 1,3

point on the curve 1.9900000000000009,3.9601000000000033 is closest to 2,4

相关问题 更多 >

    热门问题