如何正确抽样截短分布？

# Sample exponential distribution for the case x>0 import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm def pdf(x): return x*np.exp(-x) xvec=np.zeros(1000000) x=1. for i in range(1000000): a=x+np.random.normal() xs=x if a > 0. : xs=a A=pdf(xs)/pdf(x) if np.random.uniform()<A : x=xs xvec[i]=x x=np.linspace(0,15,1000) plt.plot(x,pdf(x)) plt.hist([x for x in xvec if x != 0],bins=150,normed=True) plt.show()

import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm def pdf(x): return x*np.exp(-x) # included the corresponding cdf def cdf(x): return 1. -np.exp(-x)-x*np.exp(-x) xvec=np.zeros(1000000) x=1. for i in range(1000000): a=x+np.random.normal() xs=x if a > 0.5 : xs=a A=pdf(xs)/pdf(x) if np.random.uniform()<A : x=xs xvec[i]=x x=np.linspace(0,15,1000) # new part norm the analytic pdf to fix the area plt.plot(x,pdf(x)/(1.-cdf(0.5))) plt.hist([x for x in xvec if x != 0],bins=200,normed=True) plt.savefig("test_exp.png") plt.show()

2条回答

网友

1楼 · 编辑于 2024-07-05 14:26:52

你说你想学习抽样截断分布的基本思想，但是你的来源是一篇关于 Metropolis–Hastings algorithm？你真的需要这种“从概率分布中获得一系列随机样本的方法吗？”，对于这种概率分布，直接抽样是困难的？以此为出发点就像通过阅读莎士比亚来学习英语。

截尾法向

对于截尾正态分布，基本拒绝抽样就是您所需要的：为原始分布生成样本，拒绝超出范围的样本。正如Leandro Caniglia指出的，您不应该期望截断的分布具有相同的PDF，除非间隔较短-这显然是不可能的，因为PDF图形下的区域总是1。如果你从侧面剪掉一些东西，中间必须有更多的内容；PDF会被重新缩放。

一次采集10万个样本，效率很低。我会一次采集10万个正常样本，只接受那些合适的样本；然后重复，直到我有足够的样本。amin和amax之间截短正态抽样示例：

import numpy as np
n_samples = 100000
amin, amax = -1, 2
samples = np.zeros((0,))    # empty for now
while samples.shape[0] < n_samples: 
    s = np.random.normal(0, 1, size=(n_samples,))
    accepted = s[(s >= amin) & (s <= amax)]
    samples = np.concatenate((samples, accepted), axis=0)
samples = samples[:n_samples]    # we probably got more than needed, so discard extra ones

这里是与PDF曲线的比较，按cdf(amax) - cdf(amin)除法重新缩放，如上所述。

^{pr2}$

截断多元正态

现在我们要保持第一个坐标在amin和amax之间，第二个坐标在bmin和bmax之间。同样的故事，只是有一个2列数组，并且与边界的比较是以一种相对隐蔽的方式完成的：

(np.min(s - [amin, bmin], axis=1) >= 0) & (np.max(s - [amax, bmax], axis=1) <= 0)

这意味着：从每一行中减去amin，bmin，只保留两个结果都为非负的行（这意味着a>；=amin和b>；=bmin）。对amax，bmax也做类似的事情。只接受同时满足这两个条件的行。

n_samples = 10
amin, amax = -1, 2
bmin, bmax = 0.2, 2.4
mean = [0.3, 0.5]
cov = [[2, 1.1], [1.1, 2]]
samples = np.zeros((0, 2))   # 2 columns now
while samples.shape[0] < n_samples: 
    s = np.random.multivariate_normal(mean, cov, size=(n_samples,))
    accepted = s[(np.min(s - [amin, bmin], axis=1) >= 0) & (np.max(s - [amax, bmax], axis=1) <= 0)]
    samples = np.concatenate((samples, accepted), axis=0)
samples = samples[:n_samples, :]

这里不打算绘制，但这里有一些值：自然，在范围内。

array([[ 0.43150033,  1.55775629],
       [ 0.62339265,  1.63506963],
       [-0.6723598 ,  1.58053835],
       [-0.53347361,  0.53513105],
       [ 1.70524439,  2.08226558],
       [ 0.37474842,  0.2512812 ],
       [-0.40986396,  0.58783193],
       [ 0.65967087,  0.59755193],
       [ 0.33383214,  2.37651975],
       [ 1.7513789 ,  1.24469918]])

网友

2楼 · 编辑于 2024-07-05 14:26:52

要从整个密度函数pdf计算截断的密度函数pdf_t，请执行以下操作：

设[a, b]为截断间隔；（x轴）
设A := cdf(a)和B := cdf(b)；（cdf=非截断累积分布函数）
那么pdf_t(x) := pdf(x) / (B - A)如果x在[a, b]和{}其他地方。在

在a = -infinity（分别为。b = +infinity），取{}（分别为。B := 1）。

至于你看到的“神秘感”

请注意，您的蓝色曲线错误。它不是截断分布的pdf，它只是非截断分布的pdf，按正确的数量缩放（除以1-cdf(0.5)）。实际截断的pdf曲线从x = 0.5上的一条垂直线开始，一直向上直到到达当前的蓝色曲线。换句话说，您只缩放了曲线，但忘了截断它，在本例中是向左的。这样的截断对应于上述算法中步骤3的“0别处”部分。

截尾法向

截断多元正态

相关问题更多 >

编程相关推荐

热门问题

热门文章