如何按时间和时间分组?

2024-09-29 06:31:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从pandas DateTime列中获取每天和每小时的事件数

资料

import pandas as pd
timeData = [
    '2009/6/12 2:00', '2009/6/12 3:00', '2009/6/12 4:00', '2009/6/12 5:00', '2009/6/12 6:00', '2009/6/12 7:00', '2009/6/12 8:00', '2009/6/12 9:00', '2009/6/12 10:00', '2009/6/12 11:00', '2009/6/12 12:00', '2009/6/12 13:00', '2009/6/12 14:00', '2009/6/12 15:00', '2009/6/12 16:00', '2009/6/12 17:00', '2009/6/12 18:00', '2009/6/12 19:00', '2009/6/12 20:00', '2009/6/12 21:00', '2009/6/12 22:00', '2009/6/12 23:00',
    '2009/6/13 0:00', '2009/6/13 1:00', '2009/6/13 2:00', '2009/6/13 3:00', '2009/6/13 4:00', '2009/6/13 5:00', '2009/6/13 6:00', '2009/6/13 7:00', '2009/6/13 8:00', '2009/6/13 9:00', '2009/6/13 10:00', '2009/6/13 11:00', '2009/6/13 12:00', '2009/6/13 13:00', '2009/6/13 14:00', '2009/6/13 15:00', '2009/6/13 16:00', '2009/6/13 17:00', '2009/6/13 18:00', '2009/6/13 19:00', '2009/6/13 20:00', '2009/6/13 21:00', '2009/6/13 22:00', '2009/6/13 23:00',
    '2009/6/14 0:00', '2009/6/14 1:00', '2009/6/14 2:00', '2009/6/14 3:00', '2009/6/14 4:00', '2009/6/14 5:00', '2009/6/14 6:00', '2009/6/14 7:00', '2009/6/14 8:00', '2009/6/14 9:00', '2009/6/14 10:00', '2009/6/14 11:00', '2009/6/14 12:00', '2009/6/14 13:00', '2009/6/14 14:00', '2009/6/14 15:00', '2009/6/14 16:00', '2009/6/14 17:00', '2009/6/14 18:00', '2009/6/14 19:00', '2009/6/14 20:00', '2009/6/14 21:00', '2009/6/14 22:00', '2009/6/14 23:00',
    '2009/6/15 0:00', '2009/6/15 1:00', '2009/6/15 2:00', '2009/6/15 3:00', '2009/6/15 4:00', '2009/6/15 5:00', '2009/6/15 6:00', '2009/6/15 7:00', '2009/6/15 8:00', '2009/6/15 9:00', '2009/6/15 10:00', '2009/6/15 11:00', '2009/6/15 12:00', '2009/6/15 13:00', '2009/6/15 14:00', '2009/6/15 15:00', '2009/6/15 16:00', '2009/6/15 17:00', '2009/6/15 18:00', '2009/6/15 19:00', '2009/6/15 20:00', '2009/6/15 21:00', '2009/6/15 22:00', '2009/6/15 23:00',
    '2009/6/15 0:00', '2009/6/16 1:00', '2009/6/16 2:00', '2009/6/16 3:00', '2009/6/16 4:00', '2009/6/16 5:00', '2009/6/16 6:00', '2009/6/16 7:00', '2009/6/16 8:00', '2009/6/16 9:00', '2009/6/16 10:00', '2009/6/16 11:00', '2009/6/16 12:00', '2009/6/16 13:00', '2009/6/16 14:00', '2009/6/16 15:00', '2009/6/16 16:00', '2009/6/16 17:00', '2009/6/16 18:00', '2009/6/16 19:00', '2009/6/16 20:00', '2009/6/16 21:00', '2009/6/16 22:00', '2009/6/16 23:00',
    '2009/6/15 0:00', '2009/6/17 1:00', '2009/6/17 2:00', '2009/6/17 3:00', '2009/6/17 4:00', '2009/6/17 5:00', '2009/6/17 6:00', '2009/6/17 7:00', '2009/6/17 8:00', '2009/6/17 9:00', '2009/6/17 10:00', '2009/6/17 11:00', '2009/6/17 12:00', '2009/6/17 13:00', '2009/6/17 14:00', '2009/6/17 15:00', '2009/6/17 16:00', '2009/6/17 17:00', '2009/6/17 18:00', '2009/6/17 19:00', '2009/6/17 20:00', '2009/6/17 21:00', '2009/6/17 22:00', '2009/6/17 23:00',
    '2009/6/18 0:00', '2009/6/18 1:00', '2009/6/18 2:00', '2009/6/18 3:00', '2009/6/18 4:00', '2009/6/18 5:00', '2009/6/18 6:00', '2009/6/18 7:00', '2009/6/18 8:00', '2009/6/18 9:00', '2009/6/18 10:00', '2009/6/18 11:00', '2009/6/18 12:00', '2009/6/18 13:00', '2009/6/18 14:00', '2009/6/18 15:00', '2009/6/18 16:00', '2009/6/18 17:00', '2009/6/18 18:00', '2009/6/18 19:00', '2009/6/18 20:00', '2009/6/18 21:00', '2009/6/18 22:00', '2009/6/18 23:00',
    '2009/6/15 0:00', '2009/6/19 1:00', '2009/6/19 2:00', '2009/6/19 3:00', '2009/6/19 4:00', '2009/6/19 5:00', '2009/6/19 6:00', '2009/6/19 7:00', '2009/6/19 8:00', '2009/6/19 9:00', '2009/6/19 10:00', '2009/6/19 11:00', '2009/6/19 12:00', '2009/6/19 13:00', '2009/6/19 14:00', '2009/6/19 15:00', '2009/6/19 16:00', '2009/6/19 17:00', '2009/6/19 18:00', '2009/6/19 19:00', '2009/6/19 20:00', '2009/6/19 21:00', '2009/6/19 22:00', '2009/6/19 23:00',
    '2009/6/20 0:00', '2009/6/20 1:00', '2009/6/20 2:00', '2009/6/20 3:00', '2009/6/20 4:00', '2009/6/20 5:00', '2009/6/20 6:00', '2009/6/20 7:00', '2009/6/20 8:00', '2009/6/20 9:00', '2009/6/20 10:00', '2009/6/20 11:00', '2009/6/20 12:00', '2009/6/20 13:00', '2009/6/20 14:00', '2009/6/20 15:00', '2009/6/20 16:00', '2009/6/20 17:00', '2009/6/20 18:00', '2009/6/20 19:00', '2009/6/20 20:00', '2009/6/20 21:00', '2009/6/20 22:00', '2009/6/20 23:00',
    '2009/6/21 0:00', '2009/6/21 1:00', '2009/6/21 2:00', '2009/6/21 3:00', '2009/6/21 4:00', '2009/6/21 5:00', '2009/6/21 6:00', '2009/6/21 7:00', '2009/6/21 8:00', '2009/6/21 9:00', '2009/6/21 10:00', '2009/6/21 11:00', '2009/6/21 12:00', '2009/6/21 13:00', '2009/6/21 14:00', '2009/6/21 15:00', '2009/6/21 16:00', '2009/6/21 17:00', '2009/6/21 18:00', '2009/6/21 19:00', '2009/6/21 20:00', '2009/6/21 21:00', '2009/6/21 22:00', '2009/6/21 23:00',
    '2009/6/22 0:00', '2009/6/22 1:00', '2009/6/22 2:00', '2009/6/22 3:00', '2009/6/22 4:00', '2009/6/22 5:00', '2009/6/22 6:00', '2009/6/22 7:00', '2009/6/22 8:00', '2009/6/22 9:00', '2009/6/22 10:00', '2009/6/22 11:00', '2009/6/22 12:00', '2009/6/22 13:00', '2009/6/22 14:00', '2009/6/22 15:00', '2009/6/22 16:00', '2009/6/22 17:00', '2009/6/22 18:00', '2009/6/22 19:00', '2009/6/22 20:00', '2009/6/22 21:00', '2009/6/22 22:00', '2009/6/22 23:00',
    '2009/6/23 0:00', '2009/6/23 1:00', '2009/6/23 2:00', '2009/6/23 3:00', '2009/6/23 4:00']


df = pd.DataFrame({'Timestamp': timeData})

df["Timestamp"] = pd.to_datetime(df["Timestamp"], format="%Y/%m/%d %H:%M")

所需输出

hours = ['12a', '1a', '2a', '3a', '4a', '5a', '6a',
        '7a', '8a', '9a','10a','11a',
        '12p', '1p', '2p', '3p', '4p', '5p',
        '6p', '7p', '8p', '9p', '10p', '11p'];
days = ['Saturday', 'Friday', 'Thursday',
        'Wednesday', 'Tuesday', 'Monday', 'Sunday']

output = pd.DataFrame(columns=[hours])

output["Day"] = days

所需输出带值

  Day 12a   1a  2a  3a  4a  5a  6a  7a  8a  9a  10a 11a 12p 1p  2p  3p  4p  5p  6p  7p  8p  9p  10p 11p 
Saturday
Friday
Thursday
Wednesday
Tuesday
Monday
Sunday

在python中是否有一种方法可以将数据分组为每天每小时一次


Tags: dataframepandasdfdaystimestamppd小时monday
2条回答

您可以使用:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['day'] = df['Timestamp'].dt.day_name()
df['hour'] = df['Timestamp'].dt.strftime('%I%p').str[:-1].str.lower().str.strip('0')
hours = df['hour'].unique()
df.groupby(['day', 'hour']).count().unstack().droplevel(0, axis=1).reindex(hours, axis=1)

输出:

hour        2a   3a   4a   5a   6a   7a   8a   9a  10a  11a  12p   1p   2p   3p   4p   5p   6p   7p   8p   9p  10p  11p  12a   1a
day                                                                                                                              
Friday     2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  NaN  1.0
Monday     2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  5.0  2.0
Saturday   2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
Sunday     2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
Thursday   1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
Tuesday    2.0  2.0  2.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  2.0
Wednesday  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  NaN  1.0                                                                         

或者,要填充NAs:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['day'] = df['Timestamp'].dt.day_name()
df['hour'] = df['Timestamp'].dt.strftime('%I%p').str[:-1].str.lower().str.strip('0')
hours = df['hour'].unique()
df.groupby(['day', 'hour']).count().unstack(fill_value=0).droplevel(0, axis=1).reindex(hours, axis=1)
hour       2a  3a  4a  5a  6a  7a  8a  9a  10a  11a  12p  1p  2p  3p  4p  5p  6p  7p  8p  9p  10p  11p  12a  1a
day                                                                                                            
Friday      2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    0   1
Monday      2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    5   2
Saturday    2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    2   2
Sunday      2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    2   2
Thursday    1   1   1   1   1   1   1   1    1    1    1   1   1   1   1   1   1   1   1   1    1    1    1   1
Tuesday     2   2   2   1   1   1   1   1    1    1    1   1   1   1   1   1   1   1   1   1    1    1    1   2
Wednesday   1   1   1   1   1   1   1   1    1    1    1   1   1   1   1   1   1   1   1   1    1    1    0   1

使用^{}%I%p12h格式使用^{}几个小时,然后需要一些处理来删除最后一个m,小写和删除第一个0,对于天使用^{}并传递给^{},使用原始顺序的最后一个^{}

hours = df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0')
days = df["Timestamp"].dt.day_name()

df = pd.crosstab(days, hours).reindex(index=days.unique(), columns=hours.unique())

具有指定列的可选项:

df['hours'] = df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0')
df['days']  = df["Timestamp"].dt.day_name()

df = pd.crosstab(df['days'], df['hours']).reindex(index=df['days'].unique(), 
                                                  columns=df['hours'].unique())

print (df)
Timestamp  2a  3a  4a  5a  6a  7a  8a  9a  10a  11a  ...  4p  5p  6p  7p  8p  \
Timestamp                                            ...                       
Friday      2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Saturday    2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Sunday      2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Monday      2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Tuesday     2   2   2   1   1   1   1   1    1    1  ...   1   1   1   1   1   
Wednesday   1   1   1   1   1   1   1   1    1    1  ...   1   1   1   1   1   
Thursday    1   1   1   1   1   1   1   1    1    1  ...   1   1   1   1   1   

Timestamp  9p  10p  11p  12a  1a  
Timestamp                         
Friday      2    2    2    0   1  
Saturday    2    2    2    2   2  
Sunday      2    2    2    2   2  
Monday      2    2    2    5   2  
Tuesday     1    1    1    1   2  
Wednesday   1    1    1    0   1  
Thursday    1    1    1    1   1  

[7 rows x 24 columns]

如果需要在列表中定义自定义顺序,请使用ordered Categoricals:

hourscat = ['12a', '1a', '2a', '3a', '4a', '5a', '6a',
        '7a', '8a', '9a','10a','11a',
        '12p', '1p', '2p', '3p', '4p', '5p',
        '6p', '7p', '8p', '9p', '10p', '11p'];
dayscat = ['Saturday', 'Friday', 'Thursday',
        'Wednesday', 'Tuesday', 'Monday', 'Sunday']

hours = pd.Categorical(df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0'), ordered=True, categories=hourscat)
days = pd.Categorical(df["Timestamp"].dt.day_name(), ordered=True, categories=dayscat)

df = pd.crosstab(days, hours)

print (df)
col_0      12a  1a  2a  3a  4a  5a  6a  7a  8a  9a  ...  2p  3p  4p  5p  6p  \
row_0                                               ...                       
Saturday     2   2   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   
Friday       0   1   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   
Thursday     1   1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   
Wednesday    0   1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   
Tuesday      1   2   2   2   2   1   1   1   1   1  ...   1   1   1   1   1   
Monday       5   2   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   
Sunday       2   2   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   

col_0      7p  8p  9p  10p  11p  
row_0                            
Saturday    2   2   2    2    2  
Friday      2   2   2    2    2  
Thursday    1   1   1    1    1  
Wednesday   1   1   1    1    1  
Tuesday     1   1   1    1    1  
Monday      2   2   2    2    2  
Sunday      2   2   2    2    2  

[7 rows x 24 columns]
        

相关问题 更多 >