如何使用scipy.optimiz找到Gumbel分布的参数

non_truncated_data = ([15.999737471905252, 16.105716234887431, 17.947809230275304, 16.147752064149291, 15.991427126788327, 16.687542227378565, 17.125139229445359, 19.39645340792385, 16.837044960487795, 15.804473320190725, 16.018569387471025, 16.600876724289019, 16.161306985203151, 17.338636901595873, 18.477371969176406, 17.897236722220281, 16.626465201654593, 16.196548622931672, 16.013794215070927, 16.30367884232831, 17.182106070966608, 18.984566931768452, 16.885737663740024, 16.088051117522948, 15.790480003140173, 18.160947973898388, 18.318158853376037]) threshold = 15.78581825859324 def maximum_likelihood_function(non_truncated_loads, threshold, loc, scale): """Calculates maximum likelihood function's value for given truncated data with given parameters. Maximum likelihood function for truncated data is L1 * L2. Where L1 is a product of multiplication of pdf values at non-truncated known values (non_truncated_values). L2 is a the probability that threshold value will be exceeded. """ is_first = True # calculates L1 for x in non_truncated_loads: if is_first: L1 = gumbel_pdf(x, loc, scale) is_first = False else: L1 *= gumbel_pdf(x, loc, scale) # calculates L2 cdf_at_threshold = gumbel_cdf(threshold, loc, scale) L2 = 1 - cdf_at_threshold return L1*L2 def gumbel_pdf(x, loc, scale): """Returns the value of Gumbel's pdf with parameters loc and scale at x . """ # exponent e = math.exp(1) # substitute z = (x - loc)/scale return (1/scale) * (e**(-(z + (e**(-z))))) def gumbel_cdf(x, loc, scale): """Returns the value of Gumbel's cdf with parameters loc and scale at x. """ # exponent e = math.exp(1) return (e**(-e**(-(x-loc)/scale)))

1条回答

网友

1楼 · 发布于 2024-10-09 20:12:15

首先，使用scipy.optimize优化函数的最简单方法是构造目标函数，以便第一个参数是需要优化的参数列表，而下面的参数指定其他内容，例如数据和固定参数。在

其次，使用numpy提供的矢量化将非常有帮助

因此，我们有这些：

In [61]:
#modified pdf and cdf
def gumbel_pdf(x, loc, scale):
    """Returns the value of Gumbel's pdf with parameters loc and scale at x .
    """
    # substitute
    z = (x - loc)/scale

    return (1./scale) * (np.exp(-(z + (np.exp(-z)))))

def gumbel_cdf(x, loc, scale):
    """Returns the value of Gumbel's cdf with parameters loc and scale at x.
    """
    return np.exp(-np.exp(-(x-loc)/scale))
In [62]:

def trunc_GBL(p, x):
    threshold=p[0]
    loc=p[1]
    scale=p[2]
    x1=x[x<threshold]
    nx2=len(x[x>=threshold])
    L1=(-np.log((gumbel_pdf(x1, loc, scale)/scale))).sum()
    L2=(-np.log(1-gumbel_cdf(threshold, loc, scale)))*nx2
    #print x1, nx2, L1, L2
    return L1+L2
In [63]:

import scipy.optimize as so
In [64]:
#first we make a simple Gumbel fit
so.fmin(lambda p, x: (-np.log(gumbel_pdf(x, p[0], p[1]))).sum(), [0.5,0.5], args=(np.array(non_truncated_data),))
Optimization terminated successfully.
         Current function value: 35.401255
         Iterations: 70
         Function evaluations: 133
Out[64]:
array([ 16.47028986,   0.72449091])
In [65]:
#then we use the result as starting value for your truncated Gumbel fit
so.fmin(trunc_GBL, [17, 16.47028986,   0.72449091],  args=(np.array(non_truncated_data),))
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 25
         Function evaluations: 94
Out[65]:
array([ 13.41111111,  16.65329308,   0.79694   ])

在trunc_GBL函数中，我用缩放的pdf替换了您的pdf

enter image description here

请看这里的基本原理，基本上是因为你的L1是基于pdf的，L2是基于cdf的：http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sect018.htm

然后我们注意到一个问题，请参阅最后一个输出中的Current function value: 0.000000。负对数似然函数为0。在

这是因为：

^{2}$

实际上是0。这意味着，根据您刚刚描述的模型，当阈值足够低时，总是会达到最大值，使得L1不存在（x < threshold为空）且{}为1（1-F(C)为{}，对于数据中的所有项）。在

因为这个原因，我觉得你的模特不太合适。你可能需要重新考虑一下。在

编辑

我们可以进一步分离threshold，并将其视为固定参数：

def trunc_GBL(p, x, threshold):
    loc=p[0]
    scale=p[1]
    x1=x[x<threshold]
    nx2=len(x[x>=threshold])
    L1=(-np.log((gumbel_pdf(x1, loc, scale)/scale))).sum()
    L2=(-np.log(1-gumbel_cdf(threshold, loc, scale)))*nx2
    #print x1, nx2, L1, L2
    return L1+L2

并以不同的方式调用优化器：

so.fmin(trunc_GBL, [0.5, 0.5], args=(X, np.percentile(X, 20)))
Optimization terminated successfully.
         Current function value: 20.412818
         Iterations: 72
         Function evaluations: 136
Out[9]:
array([ 16.34594943,   0.45253201])

这样，如果你想要70%的分位数，你可以简单地把它改成np.percentile(X, 30)等等。np.percentile()只是另一种方法.quantile(0.8)

编辑

相关问题更多 >

编程相关推荐

热门问题

热门文章