Python/numpy中基尼系数的计算

def G(v): bins = np.linspace(0., 100., 11) total = float(np.sum(v)) yvals = [] for b in bins: bin_vals = v[v <= np.percentile(v, b)] bin_fraction = (np.sum(bin_vals) / total) * 100.0 yvals.append(bin_fraction) # perfect equality area pe_area = np.trapz(bins, x=bins) # lorenz area lorenz_area = np.trapz(yvals, x=bins) gini_val = (pe_area - lorenz_area) / float(pe_area) return bins, yvals, gini_val v = np.random.rand(500) bins, result, gini_val = G(v) plt.figure() plt.subplot(2, 1, 1) plt.plot(bins, result, label="observed") plt.plot(bins, bins, '--', label="perfect eq.") plt.xlabel("fraction of population") plt.ylabel("fraction of wealth") plt.title("GINI: %.4f" %(gini_val)) plt.legend() plt.subplot(2, 1, 2) plt.hist(v, bins=20)

2条回答

网友

1楼 · 编辑于 2024-09-27 21:33:21

基尼系数是洛伦斯曲线下的面积，通常用来分析收入在人口中的分布。https://github.com/oliviaguest/gini使用python为其提供了简单的实现。

网友

2楼 · 编辑于 2024-09-27 21:33:21

这是意料之中的。来自均匀分布的随机样本不会产生均匀值（即所有值都相对接近）。通过一点微积分，可以证明[0，1]上均匀分布的样本基尼系数的期望值（在统计学意义上）是1/3，因此得到给定样本的1/3左右的值是合理的。

你会得到一个更低的基尼系数，比如v = 10 + np.random.rand(500)。这些值都接近10.5；相对变化低于样本v = np.random.rand(500)。实际上，样本base + np.random.rand(n)的基尼系数的期望值是1/（6*base+3）。

这是基尼系数的一个简单实现。它使用的事实是基尼系数是relative mean absolute difference的一半。

def gini(x):
    # (Warning: This is a concise implementation, but it is O(n**2)
    # in time and memory, where n = len(x).  *Don't* pass in huge
    # samples!)

    # Mean absolute difference
    mad = np.abs(np.subtract.outer(x, x)).mean()
    # Relative mean absolute difference
    rmad = mad/np.mean(x)
    # Gini coefficient
    g = 0.5 * rmad
    return g

以下是几种样本的基尼系数v = base + np.random.rand(500)：

In [80]: v = np.random.rand(500)

In [81]: gini(v)
Out[81]: 0.32760618249832563

In [82]: v = 1 + np.random.rand(500)

In [83]: gini(v)
Out[83]: 0.11121487509454202

In [84]: v = 10 + np.random.rand(500)

In [85]: gini(v)
Out[85]: 0.01567937753659053

In [86]: v = 100 + np.random.rand(500)

In [87]: gini(v)
Out[87]: 0.0016594595244509495

相关问题更多 >

编程相关推荐

热门问题

热门文章