python堆叠面积图

2024-09-30 22:19:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试创建一个堆积面积图,显示课程的演变及其数量随时间的变化。因此,我的数据框架是(索引=年份):

                    Area  Courses
Year                             
1900         Agriculture      0.0
1900        Architecture     32.0
1900           Astronomy     10.0
1900             Biology     20.0
1900           Chemistry     25.0
1900   Civil Engineering     21.0
1900           Education     14.0
1900  Engineering Design     10.0
1900             English     30.0
1900           Geography      1.0

去年:2011年

我尝试了几种解决方案,例如df.plot.area()、df.plot.area(x='Years')。 然后我认为将这些区域作为列会有所帮助,所以我尝试了

df.pivot_table(index = 'Year', columns = 'Area', values = 'Courses', aggfunc = 'sum')

但我没有得到每年的课程总数,而是得到:

Area  Aeronautical Engineering  ...  Visual Design
Year                            ...               
1900                       NaN  ...            NaN
1901                       NaN  ...            NaN

谢谢你的帮助。 这是我的第一篇文章。对不起,我错过了什么

更新。这是我的密码:

df = pd.read_csv(filepath, encoding= 'unicode_escape')
df = df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name = 'Courses').reset_index()
plt.stackplot(df['Year'], df['Courses'], labels = df['GenArea'])
plt.legend(loc='upper left')
plt.show()

下面是数据集的链接:https://data.world/makeovermonday/2020w12


Tags: 数据dfindexplotpltareananyear
1条回答
网友
1楼 · 发布于 2024-09-30 22:19:14

有了额外的信息,我做了这个。希望你喜欢

import pandas as pd
import matplotlib.pyplot as plt

plt.close('all')

df=pd.read_csv('https://query.data.world/s/djx5mi7dociacx7smdk45pfmwp3vjo',
               encoding='unicode_escape')
df=df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name=
             'Courses').reset_index()
aux1=df.duplicated(subset='GenArea', keep='first').values
aux2=df.duplicated(subset='Year', keep='first').values

n=len(aux1);year=[];courses=[]

for i in range(n):
    if not aux1[i]:
        courses.append(df.iloc[i]['GenArea'])
    if not aux2[i]:
        year.append(df.iloc[i]['Year'])
    else:
        continue

del aux1,aux2
df1=pd.DataFrame(index=year)
s=0

for i in range(len(courses)):
    df1[courses[i]]=0
for i in range(n):
    string=df.iloc[i]['GenArea']
    if any(df1.iloc[s].values==0):
        df1.at[year[s],string]=df.iloc[i]['Courses']
    else:
        s+=1
        df1.at[year[s],string]=df.iloc[i]['Courses']

del year,courses,df
df1=df1[df1.columns[::-1]]
df1.plot.area(legend='reverse')

Example

相关问题 更多 >