在PyMC3中定义数字(自定义)似然函数

2024-10-02 00:39:12 发布

您现在位置:Python中文网/ 问答频道 /正文

在看了几个问题/答案(1234567891011)和PyMC3的documentation之后,我成功地创建了mcmcc设置的MCVE(见下文)。在

我的拟合参数是连续的和离散的,所以用pm.Uniform和{}来定义优先级(对后者应用了重新缩放)。我的似然函数特别复杂(它涉及到比较我观察到的数据和一些使用自由参数生成的合成数据的N维直方图),所以我不得不用theano@as_op运算符来编写它。在

这里显示的实现可以在一个处理随机数据的玩具模型上工作,但是在我的实际模型中,可能性和参数非常相似。在

我的问题是:

  1. 这是对的吗?有什么我应该做的不同吗?在
  2. 对似然函数的调用被抛出,显然什么也没有做,也没有任何关联。这样做对吗?在
  3. 我使用NUTS作为连续参数,但是由于我的可能性是数值,我认为我不应该这样做。既然代码还在运行,我肯定是怎么回事。在

这是我第一次使用PyMC3,因此任何指针都将非常有用。在

import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import theano.tensor as tt
from theano.compile.ops import as_op


def main():
    trace = bayesMCMC()
    print(pm.summary(trace))
    pm.traceplot(trace)
    plt.show()


def bayesMCMC():
    """
    Define and process the full model.
    """
    with pm.Model() as model:
        # Define uniform priors.
        A = pm.Uniform("A", lower=0., upper=5.)
        B = pm.Uniform("B", lower=10., upper=20.)
        C = pm.Uniform("C", lower=0., upper=1.)

        # Define discrete priors.
        minD, maxD, stepD = 0.005, 0.06, 0.005
        ND = int((maxD - minD) / stepD)
        D = pm.DiscreteUniform("D", 0., ND)
        minE, maxE, stepE = 9., 10., 0.05
        NE = int((maxE - minE) / stepE)
        E = pm.DiscreteUniform("E", 0., NE)

        # Is this correct??
        logp(A, B, C, D, E)

        step1 = pm.NUTS(vars=[A, B, C])
        print("NUTS")
        step2 = pm.Metropolis(vars=[D, E])
        print("Metropolis")

        trace = pm.sample(300, [step1, step2])  # , start)

        return trace


@as_op(
    itypes=[tt.dscalar, tt.dscalar, tt.dscalar, tt.lscalar, tt.lscalar],
    otypes=[tt.dscalar])
def logp(A, B, C, D, E):
    """
    Likelihood evaluation.
    """
    # Get observed data and some extra info to re-scale the discrete parameters
    obsData, minD, stepD, minE, stepE = obsservedData()

    # Scale discrete parameters
    D, E = D * stepD + minD, E * stepE + minE

    # Generate synthetic data using the prior values
    synthData = synthetic(A, B, C, D, E)

    # Generate N-dimensional histograms for both data sets.
    obsHist, edges = np.histogramdd(obsData)
    synHist, _ = np.histogramdd(synthData, bins=edges)

    # Flatten both histograms
    obsHist_f, synHist_f = obsHist.ravel(), synHist.ravel()
    # Remove all bins where N_bin=0.
    binNzero = obsHist_f != 0
    obsHist_f, synHist_f = obsHist_f[binNzero], synHist_f[binNzero]
    # Assign small value to the 0 elements in synHist_f to avoid issues with
    # the log()
    synHist_f[synHist_f == 0] = 0.001

    # Compare the histograms of the observed and synthetic data via a Poisson
    # likelihood ratio.
    lkl = -2. * np.sum(synHist_f - obsHist_f * np.log(synHist_f))

    return lkl


def obsservedData():
    """Some 'observed' data."""
    np.random.seed(12345)
    N = 1000
    obsData = np.random.uniform(0., 10., (N, 3))

    minD, stepD = 0.005, 0.005
    minE, stepE = 9., 0.05

    return obsData, minD, stepD, minE, stepE


def synthetic(A, B, C, D, E):
    """
    Dummy function to generate synthetic data. The actual function makes use
    of the A, B, C, D, E variables (obviously).
    """
    M = np.random.randint(100, 1000)
    synthData = np.random.uniform(0., 10., (M, 3))
    return synthData


if __name__ == "__main__":
    main()

Tags: theimportdatadefasnptracett

热门问题