我有一个数据帧,其中start_time
采用正确的日期时间格式,start_station_name
作为字符串,如下所示:
start_time start_station_name
2019-03-20 11:04:16 San Francisco Caltrain (Townsend St at 4th St)
2019-04-06 14:19:06 Folsom St at 9th St
2019-05-24 17:21:11 Golden Gate Ave at Hyde St
2019-03-27 18:53:27 4th St at Mission Bay Blvd S
2019-04-16 08:45:16 Esprit Park
现在,我想简单地以月为单位绘制一年中每个名字的出现频率。为了对数据进行相应的分组,我使用了以下方法:
data = df_clean.groupby(df_clean['start_time'].dt.strftime('%B'))['start_station_name'].value_counts()
然后我得到的不是数据帧,而是一个数据类型:int64:
start_time start_station_name
April San Francisco Caltrain Station 2 (Townsend St at 4th St) 4866
Market St at 10th St 4609
San Francisco Ferry Building (Harry Bridges Plaza) 4270
Berry St at 4th St 3994
Montgomery St BART Station (Market St at 2nd St) 3550
...
September Mission Bay Kids Park 1026
11th St at Natoma St 1023
Victoria Manalo Draves Park 1018
Davis St at Jackson St 1015
San Francisco Caltrain Station (King St at 4th St) 1014
现在,我想简单地使用Seaborn的countplot()
将其绘制为一个聚集条形图,仅适用于绝对频率高于1000的情况,其中x轴表示月份,色调表示名称,y轴应显示计数:
sns.countplot(data = data[data > 1000], x = 'start_time', hue = 'start_station_name')
然后我得到错误消息Could not interpret input 'start_time'
,可能是因为它不是一个正确的数据帧。首先,我如何对其进行分组/聚合,以便可视化工作
尝试:
解释:
start_station_name
列来更改groupby中的键李>count
列重命名为count
groupby
重置索引完整代码
输出
相关问题 更多 >
编程相关推荐