用matplotlib规范化直方图

2024-07-03 05:50:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用Matplotlib绘制一个柱状图,但是我希望bins的值代表总观察值的百分比。MWE应该是这样的:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy

sns.set(style='dark')

imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')

luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())

top_left = plt.subplot(121)
top_left.imshow(luminance)
bottom_left = plt.subplot(122)
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})

# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()

这里的CDF是正常的(范围:[0,1]),但结果直方图与我的预期不符:

Histogram with values out of valid range

为什么直方图的结果在[0,4]范围内?有办法解决这个问题吗?在


Tags: importnumpymatplotlibtopas绘制plt直方图
2条回答

tel's answer is great!我只想提供一种替代方法,用更少的行来给你想要的柱状图。关键思想是在matplotlib hist函数中使用weights参数来规范化计数。您可以用以下三行代码替换sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})

lf = luminance.flatten()
sns.kdeplot(lf, cumulative=True)
sns.distplot(lf, kde=False,
             hist_kws={'weights': numpy.full(len(lf), 1/len(lf))})

enter image description here

如果您想在第二个y轴上查看直方图(更直观),请将ax=bottom_left.twinx()添加到sns.distplot

enter image description here

你认为你想要什么

以下是如何绘制柱状图,使箱子的总和为1:

import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy as np

sns.set(style='dark')

imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')

luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())

# get the histogram values
heights,edges = np.histogram(luminance.flat, bins=30)
binCenters = (edges[:-1] + edges[1:])/2

# norm the heights
heights = heights/heights.sum()

# get the cdf
cdf = heights.cumsum()

left = plt.subplot(121)
left.imshow(luminance)
right = plt.subplot(122)
right.plot(binCenters, cdf, binCenters, heights)

# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()

# confirm that the hist vals sum to 1
print('heights sum: %.2f' % heights.sum())

输出:

enter image description here

^{pr2}$

真正的答案

这个其实非常简单。就这么做吧

sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}, norm_hist=True)

以下是我运行脚本时得到的结果:

enter image description here

惊喜转身!在

所以你的直方图一直都是标准化的,按照形式标识:

enter image description here

在普通英语中,一般的做法是用连续值直方图(即它们的观测值可以用浮点数表示)的密度来表示。因此,在本例中,bin宽度乘以bin height的总和将为1.0,如您运行脚本的简化版本所示:

import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import numpy as np

imagen2 = plt.figure(1, figsize=(4,3))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')

luminance = numpy.random.randn(1000, 1000)
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())

heights,edges,patches = plt.hist(luminance.ravel(), density=True, bins=30)
widths = edges[1:] - edges[:-1]

totalWeight = (heights*widths).sum()

# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
print(totalWeight)

而且totalWeight确实等于1.0,给或取一点舍入误差。在

相关问题 更多 >