巨量Python中的二项检验

3条回答

网友

1楼 · 编辑于 2024-05-20 21:00:21

编辑后添加了这条评论：请注意，正如Daniel Stutzbach提到的，“二项式测试”可能不是最初海报所要求的（尽管他确实使用了这个表达）。他似乎在要求二项式分布的概率密度函数，这不是我下面建议的。

你试过scipy.stats.binom_测试吗？

rbp@apfelstrudel ~$ python
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy import stats
>>> print stats.binom_test.__doc__

    Perform a test that the probability of success is p.

    This is an exact, two-sided test of the null hypothesis
    that the probability of success in a Bernoulli experiment
    is `p`.

    Parameters
    ----------
    x : integer or array_like
        the number of successes, or if x has length 2, it is the
        number of successes and the number of failures.
    n : integer
        the number of trials.  This is ignored if x gives both the
        number of successes and failures
    p : float, optional
        The hypothesized probability of success.  0 <= p <= 1. The
        default value is p = 0.5

    Returns
    -------
    p-value : float
        The p-value of the hypothesis test

    References
    ----------
    .. [1] http://en.wikipedia.org/wiki/Binomial_test


>>> stats.binom_test(500, 10000)
4.9406564584124654e-324

添加文档链接的小编辑：http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom_test.html#scipy.stats.binom_test

顺便说一句：适用于scipy 0.7.2以及当前的0.8dev

网友

2楼 · 编辑于 2024-05-20 21:00:21

任何看起来像comb(n, k) * 0.5**k * 0.5**(n-k)的解决方案都不适用于大型n。大部分（全部？）平台，Python float可以存储的最小值约为2**-1022。对于大的n-k或大的k，右手边将四舍五入为0。同样地，梳子（n，k）也可以长得很大，以至于不适合放在浮子里。

一种更稳健的方法是计算probability density function作为cumulative distribution function中两个连续点之间的差，这可以使用正则化的不完全beta函数来计算（参见SciPy的“特殊函数”包）。数学上：

pdf(p, n, k) = cdf(p, n, k) - cdf(p, n, k-1)

另一种选择是使用Normal approximation，这对于大型n来说是非常精确的。如果速度是一个问题，这可能是一条路：

from math import *

def normal_pdf(x, m, v):
    return 1.0/sqrt(2*pi*v) * exp(-(x-m)**2/(2*v))

def binomial_pdf(p, n, k):
    if n < 100:
        return comb(n, k) * p**k * p**(n-k)  # Fall back to your current method
    return normal_pdf(k, n*p, n*p*(1.0-p))

我还没有测试代码，但这应该会给你一个大概的想法。

网友

3楼 · 编辑于 2024-05-20 21:00:21

GMPY还支持扩展精度浮点计算。例如：

>>> from gmpy import *
>>>
>>> def f(n,k,p,prec=256):
...     return mpf(comb(n,k),prec) * mpf(p,prec)**k * mpf(1-p,prec)**(n-k)
...
>>> print(f(1000,500,0.5))
0.0252250181783608019068416887621024545529410193921696384762532089115753731615931
>>>

我指定了256位的浮点精度。顺便说一下，source forge版本已经过时了。当前版本在code.google.com上维护，并支持Python 3.x（免责声明：我是gmpy的当前维护者）

案例

相关问题更多 >

编程相关推荐

热门问题

热门文章