用最小二乘法拟合一组具有不同模型的乙状结肠()?

2024-09-24 22:29:58 发布

您现在位置:Python中文网/ 问答频道 /正文

对于在不同时间点测量的一系列实验,我试图使用信息标准(如Akaike IC)将所有实验的固定参数拟合与每个实验都有一组单独参数的拟合进行比较。用scipy.optimize.least_squares可以吗

假设测量值在时间上大致遵循sigmoid曲线,参数为base(步骤前水平),height(步骤后水平增加),stretch(步骤持续时间)和t50(步骤前水平增加的时间)。我在下面的代码中对它们进行建模,它返回一个size[t, c]numpy ndarraymeasurements,每个时间点有t个时间点(实验)和c个值:

import numpy as np;
import matplotlib.pyplot as plt;

# lifted, scaled, stretched and shifted sigmoid
def lssig ( t, base, height, stretch, t50 ):
return base + height / ( 1 + np.exp ( -1/stretch * ( t - t50 ) ) );

# number of curves per experiment
numcurves = 10;

# number of experiments (time pts)
num_experiments = 20;

# timepoints (with simulated variability in timing)
time_jitter = 2;
tstart  = np.linspace   ( -10, 30, num = num_experiments );
tstart += time_jitter * np.random.rand ( num_experiments );

# measurements + added noise (this is added noise, not the variability in the estimates)
measurements  = np.ndarray       ( shape = ( num_experiments, numcurves ) );
measure_noise = np.random.normal ( 0, .2,  ( num_experiments, numcurves ) );

# Variability of model parameters between curves:   
# par_spreading [0] -> variability in base (start value)
# par_spreading [1] -> variability in height (end - start)
# par_spreading [2] -> variability in stretch (duration of step)
# par_spreading [3] -> variability in t50 (middle of step)
par_spreading = np.ones ( 4 );

# parameters of the model:   
# every curve can have its own parameters, variability between curves 
# is given by the par_spreading value (see above) for that parameter    
height  = 8 * np.ones ( numcurves ) + par_spreading [0] * np.random.rand ( numcurves);
base    = 2 * np.ones ( numcurves ) + par_spreading [1] * np.random.rand ( numcurves);
stretch = 4 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);
t50     = 9 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);

# fill the measurement array
for t in range(num_experiments):
    for c in range (numcurves):
    
        measurements[ t, c ] = lssig ( tstart [t],
                                     base  [c], 
                                     height   [c],
                                     stretch   [c],
                                     t50 [c],
                                     );
measurements += measure_noise;
   
# plot curves per subject
plt.plot ( tstart, measurements );
plt.show ();

different time points (from left to right) with different experiments at each point (different colours)

有一个非常好的page示例,其中使用了最小二乘法,其中拟合了1个sigmoid,并且sigmoid的参数在数组theta中传递。使用该配方,我可以在每个时间点measurements[:,0]拟合第一次实验的曲线:

from scipy.optimize import least_squares;

# point on the sigmoid, using array theta for parameters
def y ( theta, t ):
    return theta[0] + theta[1] / ( 1 + np.exp ( -1/theta[2] * ( t - theta[3] ) ) );

# objective function for least_squares: difference between noisy data & model
def fun(theta):
    return y(theta, tstart) - measurements[:,0];

# start with values for base, step, stretch and t50
theta0 = [8,2,4,9];
res1   = least_squares(fun, theta0);

# show the parameters & plot the functions
print      ( 'real: {}, estimated: {}'.format ( [ base[0], height[0], stretch[0], t50[0] ],             res1.x ) );
#
plt.plot   (  tstart, measurements[:,0],  tstart, y(res1.x,tstart) );
plt.legend ( ('noisy', 'lsq' ) );
plt.show   ( );

我在上面的代码中使用4个独立参数的原因是base可以是单个标量(每个实验的所有曲线都相同)或数组(每条曲线都有一个单独的base值)。可以这样使用最小二乘法吗?还是所有的参数都需要放在一维数组中

可以展平参数数组(如theta),但这会变得非常混乱,因为在每个实验都有自己的base参数的模型中,第二个参数将是base的值,但在所有实验都有一个base参数的模型中,数组中的第二个值是height的值,依此类推

理想情况下,参数仍然是分开的,因此它们的长度可以独立变化。 是否有一种管理上可追溯/美学上令人愉悦的方法来做到这一点


Tags: theinbase参数npheightexperimentsmeasurements
1条回答
网友
1楼 · 发布于 2024-09-24 22:29:58

也许您可以通过提供一个最小的可重复的示例来减少问题的长度并提高可读性!;)

如果我很了解你的问题:

为了避免处理一组“混乱”的参数,我经常创建两个小函数,例如param2minimizer和它的对立面minimizer2param。这些函数用于组织最小值的所有参数。这样代码就更容易理解了

例如,假设您想要优化任意数量的[base,height]

def param2minimizer(bases,heights):
    return np.concatenate((bases,heights))

def minimizer2param(m):
    L = len(m)
    if L%2: raise ValueError("Incompatible length")
    bases = m[0:L//2]
    heights = m[L//2:]
    return bases, heights

您可以修改这两个函数以完全满足您的需要

关于最小化变量的维数问题,您可以看看documentation of least_squares。我认为系统地给出一个向量(1D)而不是一个矩阵(2D)是一个更好的习惯,即使一个特定的极小值是2D允许的。有些极小化子只允许一维,而且优化算法通常在理论上和数值上描述在一维变量向量上。使用1D更安全,更符合使用要求

相关问题 更多 >