如何在Python中实现KSTest

1条回答

网友

1楼 · 发布于 2024-07-03 05:49:51

kstest的cdf参数可以是一个可调用的，它实现了要根据其测试数据的分布的累积分布函数。要使用它，您必须实现双峰分布的CDF。你希望这个分布是两个正态分布的混合。您可以通过计算组成混合的两个正态分布的CDF的加权和来实现这个分布的CDF。在

下面是一个脚本，它展示了如何做到这一点。为了演示如何使用kstest，脚本运行kstest两次。首先，它使用一个来自分布的而不是的样本。正如预期的那样，kstest为第一个示例计算一个非常小的p值。然后从混合物中提取样本。对于这个样本，p值不小。在

import numpy as np
from scipy import stats


def bimodal_cdf(x, weight1, mean1, stdv1, mean2, stdv2):
    """
    CDF of a mixture of two normal distributions.
    """
    return (weight1*stats.norm.cdf(x, mean1, stdv1) +
            (1 - weight1)*stats.norm.cdf(x, mean2, stdv2))


# We only need weight1, since weight2 = 1 - weight1.
weight1 = 0.6
mean1 = 0.036
stdv1 = 0.52
mean2 = 1.25
stdv2 = 0.4

n = 200

# Create a sample from a regular normal distribution that has parameters
# similar to the bimodal distribution.
sample1 = stats.norm.rvs(0.5*(mean1 + mean2), 0.5, size=n)

# The result of kstest should show that sample1 is not from the bimodal
# distribution (i.e. the p-value should be very small).
stat1, pvalue1 = stats.kstest(sample1, cdf=bimodal_cdf,
                              args=(weight1, mean1, stdv2, mean2, stdv2))
print("sample1 p-value =", pvalue1)

# Create a sample from the bimodal distribution.  This sample is the
# concatenation of samples from the two normal distributions that make
# up the bimodal distribution.  The number of samples to take from the
# first distributions is determined by a binomial distribution of n
# samples with probability weight1.
n1 = np.random.binomial(n, p=weight1)
sample2 = np.concatenate((stats.norm.rvs(mean1, stdv1, size=n1),
                         (stats.norm.rvs(mean2, stdv2, size=n - n1))))

# Most of time, the p-value returned by kstest with sample2 will not
# be small.  We expect the value to be uniformly distributed in the interval
# [0, 1], so in general it will not be very small.
stat2, pvalue2 = stats.kstest(sample2, cdf=bimodal_cdf,
                              args=(weight1, mean1, stdv1, mean2, stdv2))
print("sample2 p-value =", pvalue2)

典型输出（每次运行脚本时数字都不同）：

^{pr2}$

你可能会发现这个测试没用。您有4800个示例，但在代码中，您的参数的数值只有一到两个有效数字。除非你有充分的理由相信你的样本是从一个分布中提取的，而这个分布恰好带有这些参数，否则kstest很可能会返回一个非常小的p值。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在Python中实现KSTest

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >