在xaxis上创建带有月份的Pandas数据框的刻面图

2024-05-19 07:05:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据系列,包括每个会计年度的月度销售额。我正在使用pandas数据帧来存储数据。每个财政年度从3月1日开始,到次年2月最后一天结束。我使用plotly刻面图来显示一年中垂直对齐的月份,因此2021年3月低于2020年3月,依此类推

尽管对x轴使用了分类变量,但排序略有偏差。我尝试过使用具有唯一值的“yearmon”变量进行排序,但这也不起作用。具体而言,在下图中,2018年1月和2月的数值为空,2021年1月和2月的数值也不合适。如何让方面显示连续数据而不出现这些问题? 编辑:我有一种感觉,它与类别的顺序有关,但还没有确定下来

Faceted plot using plotly and pandas dataframe

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

fig = px.bar(df, x = 'month', y = 'A', facet_col='year', facet_col_wrap=1)
py.image.save_as(fig, 'plotly.png', width=1000, height=500)

更新

以@Vesland下面的代码为基础,我根据下面的评论调整了开始日期和财政年度分配,因为财政年度通常与日历年不一致。此外,数据系列的长度是任意的——可能是几个月,也可能是十年——开始和结束月份也是如此。最后,我希望x轴以财政年度的第一个月和最后一个月开始和结束,因此在本例中(3月和2月),“Mar”应该是左侧的第一个勾号,“Feb”应该是右侧的最后一个勾号。如果这不够清楚,我深表歉意

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()

这似乎给出了以下结论: Plot using non-calendar fiscal year


Tags: 数据importdfindexasnpplotlyyear
2条回答

这种情况下的问题似乎是,plotly不遵守x轴所用pandas数据系列中的类别顺序,除非如plotly论坛here中指出的和有文件记录的here中明确指示这样做。在px.bar调用中使用category_orders允许我们覆盖默认的plotly假设,并创建一个从指定会计年度的第一个月到会计年度的最后一个月的x轴

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-01-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2017]*2+[2018]*12+[2019]*12+[2020]*10

fig = px.bar(df, x = 'month', y = 'A', 
              facet_col='fiscal_year',
              facet_col_wrap=1,
              category_orders={ # replaces default order by column name
                "month": ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
            })       
fig.show() 

Faceted plot of pandas dataframe using ordered categories

如果我理解正确的话,那么除了一个小细节之外,你似乎做的每件事都是对的。这有点令人惊讶,所以我很有可能误解了你问题的前提。无论如何

Specifically, in the plot below the values for Jan and Feb in 2018 are blank

那是因为df.head()中不存在这样的日期

             A  year month monthindex yearmon
2018-03-31  93  2018   Mar         03  201803
2018-04-30  84  2018   Apr         04  201804
2018-05-31  95  2018   May         05  201805
2018-06-30  86  2018   Jun         06  201806
2018-07-31  84  2018   Jul         07  201807

如果我正确理解你的意图,你实际上会想把january and february of 2019和第一个x轴联系起来。尽管你做了充分的努力,但没有这样的联系。我不太确定你会怎么做,但如果你确定要这样做:

df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12

并获得:

enter image description here

然后你就可以跑了

fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year',facet_col_wrap=1)

并获得:

enter image description here

如您所见,January and february of 2019现在出现在2018年的x轴上。在接下来的几年里一直如此。我希望这就是你想要的。如果没有,请随时告诉我

完整代码:

import pandas as pd
import numpy as np
import plotly.express as px
import chart_studio.plotly as py

rng = np.random.default_rng(12345)
df = pd.DataFrame(rng.integers(80, 100, size=(36, 1)), columns=list('A'))
df.index = pd.date_range("2018-03-01", periods=36, freq="M")
df['year'] = df.index.strftime('%Y')
df['month'] = df.index.strftime('%b')
df['monthindex'] = df.index.strftime('%m')
df['yearmon'] = df['year']+df['monthindex']

month_categories = ['Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb']
df["month"] = pd.Categorical(df["month"], categories = month_categories)
df = df.sort_values(by = "yearmon")

df['fiscal_year'] = [2018]*12+[2019]*12+[2020]*12
fig = px.bar(df, x = 'month', y = 'A', facet_col='fiscal_year', facet_col_wrap=1)
fig.show()

相关问题 更多 >

    热门问题