根据datetime列将1行拆分为多行小时数据

email room Start Date End Date abc@corp.com L11M2 2021-02-01 08:00:00 2021-02-01 09:00:00 abc@corp.com L11M2 2021-02-01 09:00:00 2021-02-01 10:00:00 abc@corp.com L11M2 2021-02-01 10:00:00 2021-02-01 11:00:00 xyz@corp.com L12M4 2021-02-01 08:00:00 2021-02-01 09:00:00 xyz@corp.com L12M4 2021-02-01 09:00:00 2021-02-01 10:00:00

3条回答

网友

1楼 · 编辑于 2024-09-27 23:26:30

下面是一个使用pandas.date_range和explode的简单解决方案：

df['Start Date'] = df.apply(lambda d: pd.date_range(d['Start Date'],
                                                    d['End Date'], 
                                                    freq='h')[:-1], 
                            axis=1)
df = df.explode('Start Date')
df['End Date'] = df['Start Date'] + pd.Timedelta('1h')

输出：

          email   room          Start Date            End Date
0  abc@corp.com  L11M2 2021-02-01 08:00:00 2021-02-01 09:00:00
0  abc@corp.com  L11M2 2021-02-01 09:00:00 2021-02-01 10:00:00
0  abc@corp.com  L11M2 2021-02-01 10:00:00 2021-02-01 11:00:00
1  xyz@corp.com  L12M4 2021-02-01 08:00:00 2021-02-01 09:00:00
1  xyz@corp.com  L12M4 2021-02-01 09:00:00 2021-02-01 10:00:00

网友

2楼 · 编辑于 2024-09-27 23:26:30

让我们创建一些示例数据

from datetime import datetime, timedelta

ref = now.replace(minute=0, second=0, microsecond=0)
def shifted(i): return ref + timedelta(hour=i)

df = pd.DataFrame([
    ('A', 'B', shifted(1), shifted(10)), 
    ('C', 'D', shifted(-5), shifted(-1))], 
    columns=['name', 'email', 'start', 'end'])

数据如下所示

  name email               start                 end
0    A     B 2021-08-27 12:00:00 2021-08-27 21:00:00
1    C     D 2021-08-27 06:00:00 2021-08-27 05:00:00

您可以使用apply函数拆分每一行，确保返回一个pd.Series

new_start = df.apply(lambda row: pd.Series(pd.date_range(row.start, row.end, freq='H')), axis=`).stack()

在这之后，new_start是每小时的开始，有一个双索引，一个是原始索引，一个是特定块的顺序，也可能有用

0  0   2021-08-27 12:00:00
   1   2021-08-27 13:00:00
   2   2021-08-27 14:00:00
   3   2021-08-27 15:00:00
   4   2021-08-27 16:00:00
   5   2021-08-27 17:00:00
   6   2021-08-27 18:00:00
   7   2021-08-27 19:00:00
   8   2021-08-27 20:00:00
   9   2021-08-27 21:00:00
1  0   2021-08-27 06:00:00
   1   2021-08-27 07:00:00
   2   2021-08-27 08:00:00
   3   2021-08-27 09:00:00
   4   2021-08-27 10:00:00
dtype: datetime64[ns]

现在只需将其连接到原始帧

res = df[["name", "email"]].join(
new_start.reset_index(1, drop=True).rename("start"))

您可以像这样添加end列

res["end"] = res.start + timedelta(hours=1)

网友

3楼 · 编辑于 2024-09-27 23:26:30

pandasmelt与pyjanitor的complete的组合可以帮助转换数据：

 # pip install pyjanitor
 import pandas as pd
 import janitor
(df.melt(['email', 'room'], value_name = 'Start_Date')
.reindex([3,1,2,0])
 # complete is a wrapper around pandas functions
 # to expose missing values ... in this case it exposes the 
 # missing dates for each group in by
.complete([{'Start_Date':lambda df: pd.date_range(df.min(), df.max(),freq='H')}], 
           by=['email', 'room'])
.assign(End_Date = lambda df: df.Start_Date.add(pd.Timedelta('1 hour')))
.query('variable != "End Date"').drop(columns='variable'))

          email   room          Start_Date            End_Date
0  abc@corp.com  L11M2 2021-02-01 08:00:00 2021-02-01 09:00:00
1  abc@corp.com  L11M2 2021-02-01 09:00:00 2021-02-01 10:00:00
2  abc@corp.com  L11M2 2021-02-01 10:00:00 2021-02-01 11:00:00
4  xyz@corp.com  L12M4 2021-02-01 08:00:00 2021-02-01 09:00:00
5  xyz@corp.com  L12M4 2021-02-01 09:00:00 2021-02-01 10:00:00

相关问题更多 >

编程相关推荐

热门问题

热门文章