<p>这是意料之中的。来自均匀分布的随机样本不会产生均匀值(即所有值都相对接近)。通过一点微积分,可以证明[0,1]上均匀分布的样本基尼系数的<em>期望</em>值(在统计学意义上)是1/3,因此得到给定样本的1/3左右的值是合理的。</p>
<p>你会得到一个更低的基尼系数,比如<code>v = 10 + np.random.rand(500)</code>。这些值都接近10.5;<em>相对</em>变化低于样本<code>v = np.random.rand(500)</code>。
实际上,样本<code>base + np.random.rand(n)</code>的基尼系数的期望值是1/(6*base+3)。</p>
<p>这是基尼系数的一个简单实现。它使用的事实是基尼系数是<a href="https://en.wikipedia.org/wiki/Mean_absolute_difference#Relative_mean_absolute_difference" rel="noreferrer">relative mean absolute difference</a>的一半。</p>
<pre><code>def gini(x):
# (Warning: This is a concise implementation, but it is O(n**2)
# in time and memory, where n = len(x). *Don't* pass in huge
# samples!)
# Mean absolute difference
mad = np.abs(np.subtract.outer(x, x)).mean()
# Relative mean absolute difference
rmad = mad/np.mean(x)
# Gini coefficient
g = 0.5 * rmad
return g
</code></pre>
<p>以下是几种样本的基尼系数<code>v = base + np.random.rand(500)</code>:</p>
<pre><code>In [80]: v = np.random.rand(500)
In [81]: gini(v)
Out[81]: 0.32760618249832563
In [82]: v = 1 + np.random.rand(500)
In [83]: gini(v)
Out[83]: 0.11121487509454202
In [84]: v = 10 + np.random.rand(500)
In [85]: gini(v)
Out[85]: 0.01567937753659053
In [86]: v = 100 + np.random.rand(500)
In [87]: gini(v)
Out[87]: 0.0016594595244509495
</code></pre>