按条件显示数据帧放置线

2024-09-29 01:27:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我创建了一些数据:

import pandas as pd
d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58', '02.10.2019 13:11:58', '02.10.2019 13:22:55']
     ,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
     ,'Name': ['CTO', 'CTO', 'CFO', 'CFO', 'CFO' , 'CFO']}
df = pd.DataFrame(data=d)

    Time                    Action  Name
0   01.10.2019, 09:56:52    Opened  CTO
1   01.10.2019, 09:57:15    Closed  CTO
2   02.10.2019, 09:57:23    Opened  CFO
3   02.10.2019, 10:02:58    Closed  CFO
4   02.10.2019, 13:11:58    Opened  CFO
5   02.10.2019, 13:22:55    Closed  CFO

现在我想在时间<;5分钟,如果有多行具有相同名称,则应将行放在第一个“打开”操作和最后一个“关闭”操作之间,因此每次都将首先作为操作打开,然后在名称相同时关闭。我试过了

mask = df.drop(df[pd.to_datetime(df["Time"]).diff().dt.seconds.gt(300)].index)

但这只显示了前三行。我怎么能这么做

我的输出应该是这样的:

    Time                    Action  Name
0   02.10.2019, 09:57:23    Opened  CFO
1   02.10.2019, 13:22:55    Closed  CFO

因为前两行不到5分钟,第三行和第四行与之前的名称相同。但如果日期是一天后,则应如下所示:

    Time                    Action  Name
2   02.10.2019, 09:57:23    Opened  CFO
3   02.10.2019, 10:02:58    Closed  CFO
4   03.10.2019, 13:11:58    Opened  CFO
5   03.10.2019, 13:22:55    Closed  CFO

Tags: 数据nameimport名称dataframepandasdftime
1条回答
网友
1楼 · 发布于 2024-09-29 01:27:29

也许不是世界上最干净的方式,但它完成了任务:

import pandas as pd

d = {'Time': ['01.10.2019, 09:56:52', '01.10.2019, 09:57:15', '02.10.2019 09:57:23', '02.10.2019 10:02:58',
              '02.10.2019 13:11:58', '02.10.2019 13:22:55', '03.10.2019 14:20:44', '03.10.2019 14:30:44']
    , 'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
    , 'Name': ['CTO', 'CTO', 'CFO', 'CFO', 'CFO', 'CFO', 'CFO', 'CFO']}
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
df.insert(1, 'Date', df['Time'].apply(lambda x: x.date()))

out = pd.DataFrame()
for name, group in df.groupby(['Name', 'Date']):
    first_open_idx = group[group['Action'] == 'Opened']['Time'].first_valid_index()
    last_close_idx = group[group['Action'] == 'Closed']['Time'].last_valid_index()

    if first_open_idx is not None and last_close_idx is not None:
        time_diff = group.loc[last_close_idx]['Time'] - group.loc[first_open_idx]['Time']
        if time_diff.seconds > 300:
            out = out.append(group[group.index.isin([first_open_idx, last_close_idx])])

print(out)

相关问题 更多 >