Groupby agg平均值和计数,包括日期时间值

2024-06-29 01:05:26 发布

您现在位置:Python中文网/ 问答频道 /正文

主楼

product_id viewed_date  viewed_storeA    viewed_storeB    viewed_storeA_first  time_delta   
323224     2019-04-01   2019-04-01 08:01 2019-04-01 08:20 True                 00:19:00
942234     2019-04-01   2019-04-01 08:13 2019-04-01 08:43 True                 00:30:00
424244     2019-04-01   2019-04-01 07:20 2019-04-01 08:20 True                 01:00:00
749249     2019-04-02   2019-04-02 06:00 2019-04-02 07:30 True                 01:30:00
224345     2019-04-02   2019-04-02 06:00 2019-04-02 08:00 True                 02:00:00

期望输出测向

viewed_date  viewed_storeA_first_count time_delta_mean 
2019-04-01   3                         00:36:00
2019-04-02   2                         01:05:00

这是我迄今为止尝试过的,但我得到了以下错误:No numeric types to aggregate

df_grouped = df.groupby('viewed_date') \
       .agg({'viewed_storeA_first':'count', 'time_delta':'mean'}) \
       .rename(columns={'viewed_storeA_first':'viewed_storeA_first_count','time_delta':'time_delta_mean'}) \
       .reset_index()

Tags: idtruedfdatetimecount错误product
1条回答
网友
1楼 · 发布于 2024-06-29 01:05:26
  • 当前,time_delta列是timedelta64数据类型,但要执行聚合函数,它必须是整数
import pandas as pd

data = {'product_id': [323224, 942234, 424244, 749249, 224345],
        'viewed_date': ['2019-04-01', '2019-04-01', '2019-04-01', '2019-04-02', '2019-04-02'],
        'viewed_storeA': ['2019-04-01 08:01', '2019-04-01 08:13', '2019-04-01 07:20', '2019-04-02 06:00', '2019-04-02 06:00'],
        'viewed_storeB': ['2019-04-01 08:20', '2019-04-01 08:43', '2019-04-01 08:20', '2019-04-02 07:30', '2019-04-02 08:00'],
        'viewed_storeA_first': [True, True, True, True, True]}

df = pd.DataFrame(data)

# convert columns to datetime
for col in df.columns[1:4]:
    df[col] = pd.to_datetime(df[col])

# calculate time_delta as an int
df['time_delta'] = (df.viewed_storeB - df.viewed_storeA).astype('timedelta64[m]')

# groupby and perform aggregations
dfg = df.groupby(['viewed_date']).agg({'viewed_storeA_first': 'count', 'time_delta': 'mean'}).rename(columns={'viewed_storeA_first':'viewed_storeA_first_count','time_delta':'time_delta_mean'}).reset_index()

# optional: round the minutes
dfg.time_delta_mean = dfg.time_delta_mean.apply(round)

# convert time_delta_mean back to a timedelta
dfg.time_delta_mean = pd.to_timedelta(dfg.time_delta_mean, unit='m')

# display(dfg)
  viewed_date  viewed_storeA_first_count time_delta_mean
0  2019-04-01                          3        00:36:00
1  2019-04-02                          2        01:45:00

相关问题 更多 >