使用pandas和matplotlib/seaborn通过首次计数频率绘制时间序列数据

2024-06-26 03:21:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个带有时间戳的电子邮件和收件人的数据集

对于前三位的emailer,我想通过senderdayyearmonthweek_year来绘制唯一电子邮件的数量

在我看来,首先我必须使用groupby来计算相关的摘要,然后绘制相同的摘要,或者有没有直接这样做的方法

通常,如果我不必先计算频率,我可以直接绘制值,使用timestamp列作为索引

我被困在这里,需要先计算频率

我能做到以下几点

设置数据帧:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df_in = pd.DataFrame({
'sender':['Able Boy','Able Boy','Able Boy','Mark L. Taylor','Mark L. Taylor',
    'Mark L. Taylor','scott kirk','scott kirk','scott kirk','scott kirk',
    'Able Boy','Able Boy','james h. madison','james h. madison','james h. madison',
    'james joyce','james joyce','james joyce','james joyce','james joyce',
    'scott kirk','Able Boy'],
'receiver':['Toni Z. Zapata','Mark Angel','Johnny C. Cash','paul a boyd','michelle fam',
    'debbie bradford','Mark Angel','Johnny C. Cash','Able Boy','Mark L. Taylor',
    'jenny chang','julie s. smith', 'hillary t. young', 'tiffany r.','Able Boy',
    'Mark Angel','Able Boy','julie s. smith','jenny chang','debbie bradford',
    'Able Boy','Toni Z. Zapata'],
'time':[911929000000,911929000000,910228000000,911497000000,911497000000,
    911932000000,914261000000,914267000000,914269000000,914276000000,
    914932000000,915901000000,916001000000,916001000000,916001000000,
    947943000000,947943000000,947943000000,947943000000,947943000000,
    916001000000,911929100000],
'email_ID':['<A34E5R>','<A34E5R>','<B34E5R>','<C34E5R>','<C34E5R>',
    '<C36E5R>','<C36E5A>','<C36E5B>','<C36E5C>','<C36E5D>',
    '<D000A0>','<D000A1>','<D000A2>','<D000A2>','<D000A2>',
    '<D000A3>','<D000A3>','<D000A3>','<D000A3>','<D000A3>',
    '<D000A4>','<A34E5S>']
})

转换数据帧:

df_1 = df_in.copy()
df_1['week_year'] = df_1['time'].apply(lambda x:"%d/%d" %(x.year,x.week))
df_1 = df_1.set_index('time')

df_1['year'] = df_1.index.year
df_1['week'] = df_1.index.week
df_1['date'] = df_1.index.date
df_1['hour'] = df_1.index.hour
df_1['day'] = df_1.index.day
df_1['month'] = df_1.index.month
df_1['weekday_name'] = df_1.index.weekday_name

df_grp_1 = (df_1.groupby(['sender','year']).email_ID.nunique())
print("\nGROUP BY SENDER AND YEAR:")
print(df_grp_1)
print(type(df_grp_1))

df_grp_2 = (df_1.groupby(['sender','date']).email_ID.nunique())
print("\nGROUP BY SENDER AND DATE:")
print(df_grp_2)
print(type(df_grp_2))

df_grp_3 = (df_1.groupby(['sender','week_year']).email_ID.nunique())
print("\nGROUP BY SENDER AND week_year:")
print(df_grp_3)
print(type(df_grp_3))

Tags: dfindexableyearscottsendermarkboy