使用numpy拟合多项式会随dtype而变化,即使实际数据值保持不变

2024-10-04 13:29:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个由xdataydata组成的数据集,我想对其拟合多项式,但出于某种原因,拟合结果取决于数据集的dtype,即使数据的实际值保持不变。我理解,如果您将dtype例如从float更改为int,可能会丢失一些信息,但在这种情况下,我将从'f4'转换为'f8',因此不会丢失任何信息,这就是我感到困惑的原因。这是怎么回事

import numpy as np
from numpy.polynomial import polynomial

x32 = np.array([
    1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
    1895.7642, 1896.4286, 1896.5693, 1897.313,  1898.4648
], dtype='f4')

y32 = np.array([
    510.83655, 489.91592, 486.4508,  469.21814, 465.7902,  388.65576,
    385.37637, 369.07236, 365.8301,  349.7118,  327.4062
], dtype='f4')

x64 = x32.astype('f8')
y64 = y32.astype('f8')

a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True)

c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True)

print(residuals1, residuals2, residuals3, residuals4)  # [] [195.86309188] [] [195.86309157]
print(a)        # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(b)        # [-8.70836523e-03  7.50419309e-02  3.15525483e+04]
print(c[::-1])  # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(d[::-1])  # [-8.7083541e-03   7.5099051e-02   3.1552398e+04 ]

我也注意到了这个问题,因为我也对残差值感兴趣,结果它们是空的,这导致了我的程序崩溃


Tags: 数据信息truenpfullprintdtypex64
2条回答

这种不同的行为是由于^{}中的rcond,这取决于精度:

    rcond : float, optional
        Relative condition number of the fit. Singular values smaller than
        this relative to the largest singular value will be ignored. The
        default value is len(x)*eps, where eps is the relative precision of
        the float type, about 2e-16 in most cases.

...

    # set rcond
    if rcond is None:
        rcond = len(x)*finfo(x.dtype).eps

对于32位示例,将rcond设置为适当的小值将产生与64位示例相同的结果(例如rcond=1e-7或更小)

发生这种差异的原因是polyfit()rcond隐藏参数对于float32和float64是不同的。这是近似的相对误差。对于float32,其默认值约为2e-7,对于float64,其默认值约为2e-16。如果您自己指定相同的rcond参数,那么您将得到相同的结果

下面的代码使用rcond参数,还使用np.polyval绘制绘图,以显示几乎相同的视觉结果

Try it online!

import numpy as np
from numpy.polynomial import polynomial
import matplotlib.pyplot as plt

x32 = np.array([
    1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
    1895.7642, 1896.4286, 1896.5693, 1897.313,  1898.4648
], dtype = 'f4')

y32 = np.array([
    510.83655, 489.91592, 486.4508,  469.21814, 465.7902,  388.65576,
    385.37637, 369.07236, 365.8301,  349.7118,  327.4062
], dtype = 'f4')

x64 = x32.astype('f8')
y64 = y32.astype('f8')

rcond = 2e-7

a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True, rcond = rcond)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True, rcond = rcond)

c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True, rcond = rcond)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True, rcond = rcond)

print(residuals1, residuals2, residuals3, residuals4)  
# [] [195.86309188] [] [195.86309157]
print(a)  # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(b)  # [-8.70836523e-03  7.50419309e-02  3.15525483e+04]
print(c)  # [ 1.28004924e+07 -1.34738721e+04  3.54575804e+00]
print(d)  # [ 3.1552398e+04  7.5099051e-02 -8.7083541e-03]

plt.plot(x64, y64, label = 'orig')
plt.plot(x32, np.polyval(a, x32), label = 'x32_v0')
plt.plot(x64, np.polyval(b, x64), label = 'x64_v0')
plt.plot(x32, np.polyval(c[::-1], x32), label = 'x32_v1')
plt.plot(x64, np.polyval(d[::-1], x64), label = 'x64_v1')
plt.legend()
plt.show()

enter image description here

相关问题 更多 >