我有一个带有时间戳的电子邮件和收件人的数据集
对于前三位的emailer,我想通过sender
、day
、year
、month
和week_year
来绘制唯一电子邮件的数量
在我看来,首先我必须使用groupby来计算相关的摘要,然后绘制相同的摘要,或者有没有直接这样做的方法
通常,如果我不必先计算频率,我可以直接绘制值,使用timestamp列作为索引
我被困在这里,需要先计算频率
我能做到以下几点
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_in = pd.DataFrame({
'sender':['Able Boy','Able Boy','Able Boy','Mark L. Taylor','Mark L. Taylor',
'Mark L. Taylor','scott kirk','scott kirk','scott kirk','scott kirk',
'Able Boy','Able Boy','james h. madison','james h. madison','james h. madison',
'james joyce','james joyce','james joyce','james joyce','james joyce',
'scott kirk','Able Boy'],
'receiver':['Toni Z. Zapata','Mark Angel','Johnny C. Cash','paul a boyd','michelle fam',
'debbie bradford','Mark Angel','Johnny C. Cash','Able Boy','Mark L. Taylor',
'jenny chang','julie s. smith', 'hillary t. young', 'tiffany r.','Able Boy',
'Mark Angel','Able Boy','julie s. smith','jenny chang','debbie bradford',
'Able Boy','Toni Z. Zapata'],
'time':[911929000000,911929000000,910228000000,911497000000,911497000000,
911932000000,914261000000,914267000000,914269000000,914276000000,
914932000000,915901000000,916001000000,916001000000,916001000000,
947943000000,947943000000,947943000000,947943000000,947943000000,
916001000000,911929100000],
'email_ID':['<A34E5R>','<A34E5R>','<B34E5R>','<C34E5R>','<C34E5R>',
'<C36E5R>','<C36E5A>','<C36E5B>','<C36E5C>','<C36E5D>',
'<D000A0>','<D000A1>','<D000A2>','<D000A2>','<D000A2>',
'<D000A3>','<D000A3>','<D000A3>','<D000A3>','<D000A3>',
'<D000A4>','<A34E5S>']
})
df_1 = df_in.copy()
df_1['week_year'] = df_1['time'].apply(lambda x:"%d/%d" %(x.year,x.week))
df_1 = df_1.set_index('time')
df_1['year'] = df_1.index.year
df_1['week'] = df_1.index.week
df_1['date'] = df_1.index.date
df_1['hour'] = df_1.index.hour
df_1['day'] = df_1.index.day
df_1['month'] = df_1.index.month
df_1['weekday_name'] = df_1.index.weekday_name
df_grp_1 = (df_1.groupby(['sender','year']).email_ID.nunique())
print("\nGROUP BY SENDER AND YEAR:")
print(df_grp_1)
print(type(df_grp_1))
df_grp_2 = (df_1.groupby(['sender','date']).email_ID.nunique())
print("\nGROUP BY SENDER AND DATE:")
print(df_grp_2)
print(type(df_grp_2))
df_grp_3 = (df_1.groupby(['sender','week_year']).email_ID.nunique())
print("\nGROUP BY SENDER AND week_year:")
print(df_grp_3)
print(type(df_grp_3))
目前没有回答
相关问题 更多 >
编程相关推荐