如何用python制作Pareto图?

2024-05-19 14:31:20 发布

您现在位置:Python中文网/ 问答频道 /正文

Pareto是Excel和Tableu中非常流行的诊断工具。在excel中,我们可以很容易地画出一个Pareto图,但是我发现用Python来画这个图并不容易。

我有一个这样的熊猫数据框:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
print(df)

         country
USA        177.0
Canada       7.0
Russia       4.0
UK           2.0
Belgium      2.0
Mexico       1.0
Germany      1.0
Denmark      1.0

如何绘制帕累托图? 也许用熊猫,海伯恩,matplotlib等?

到目前为止,我已经能够制作降序条形图。 但仍需将累积和线图放在其上。

我的尝试: df.sort_values(by='country',ascending=False).plot.bar()

所需绘图:


Tags: importdfmatplotlibascountrypdukmexico
2条回答

您可能希望创建一个包含百分比的新列,并将一列绘制为条形图,另一列绘制为双轴折线图。

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
df = df.sort_values(by='country',ascending=False)
df["cumpercentage"] = df["country"].cumsum()/df["country"].sum()*100


fig, ax = plt.subplots()
ax.bar(df.index, df["country"], color="C0")
ax2 = ax.twinx()
ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
ax2.yaxis.set_major_formatter(PercentFormatter())

ax.tick_params(axis="y", colors="C0")
ax2.tick_params(axis="y", colors="C1")
plt.show()

enter image description here

ImportanceOfBeingErnest代码的更通用版本:

def create_pareto_chart(df, by_variable, quant_variable):
    df.index = by_variable
    df["cumpercentage"] = quant_variable.cumsum()/quant_variable.sum()*100

    fig, ax = plt.subplots()
    ax.bar(df.index, quant_variable, color="C0")
    ax2 = ax.twinx()
    ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
    ax2.yaxis.set_major_formatter(PercentFormatter())

    ax.tick_params(axis="y", colors="C0")
    ax2.tick_params(axis="y", colors="C1")
    plt.show()

这一个也包括按阈值分组的Pareto。 例如:如果将其设置为70,则它会将70岁以上的少数民族分组为一个称为“其他”的组。

def create_pareto_chart(by_variable, quant_variable, threshold):

total=quant_variable.sum()
df = pd.DataFrame({'by_var':by_variable, 'quant_var':quant_variable})
df["cumpercentage"] = quant_variable.cumsum()/quant_variable.sum()*100
df = df.sort_values(by='quant_var',ascending=False)
df_above_threshold = df[df['cumpercentage'] < threshold]
df=df_above_threshold
df_below_threshold = df[df['cumpercentage'] >= threshold]
sum = total - df['quant_var'].sum()
restbarcumsum = 100 - df_above_threshold['cumpercentage'].max()
rest = pd.Series(['OTHERS', sum, restbarcumsum],index=['by_var','quant_var', 'cumpercentage'])
df = df.append(rest,ignore_index=True)
df.index = df['by_var']
df = df.sort_values(by='cumpercentage',ascending=True)


fig, ax = plt.subplots()
ax.bar(df.index, df["quant_var"], color="C0")
ax2 = ax.twinx()
ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
ax2.yaxis.set_major_formatter(PercentFormatter())

ax.tick_params(axis="x", colors="C0", labelrotation=70)
ax.tick_params(axis="y", colors="C0")
ax2.tick_params(axis="y", colors="C1")

plt.show()

相关问题 更多 >