如何根据条件从不同的列组成一个组？

Air-line City Time ID 0 easyJet London 20:40 1 1 airberlin Berlin 10:30 2 2 Emarite Dubai 21:45 3 3 Qatar Airways Newyork 10:30 4 4 easyJet London 20:46 5 5 airberlin Berlin 10:34 6 . . . . 99 Qatar Airways London 20:40 13 100 airberlin Berlin 10:32 20

Air-line City Time ID 0 easyJet London 20:40 1 1 airberlin Berlin 10:30 2 2 Emarite Dubai 21:45 3 3 Qatar Airways Newyork 10:30 4 4 easyJet London 20:46 1 5 airberlin Berlin 10:34 2 . . . . 99 Qatar Airways London 20:40 13 100 airberlin Berlin 10:32 2

1条回答

网友

1楼 · 发布于 2024-09-30 01:37:45

您可以使用6分钟的步骤对time列进行分类，如下所示。我在这里使用pandas.cut函数。作为bins，我传递从pd.date_range检索到的一系列datetime对象。在pd.cut中，我使用right=False包含区间左侧的点，并排除区间右侧的点

我使用了一个小数据帧作为示例，但您会明白这一点

import datetime

import pandas as pd


df = pd.DataFrame({
    'time': ['20:30', '20:33', '20:36', '20:40', '20:42'],
    'ID': [1, 2, 3, 4, 5],
})
df['time'] = pd.to_datetime(df['time'])

start = df['time'].min()
end = df['time'].max() + pd.Timedelta('6min')
bins = pd.date_range(start, end, freq='6T')

cut = pd.cut(df['time'], bins=bins, right=False)
df['time_category'] = cut

df['ID'] = df.groupby('time_category')['ID'].transform('first')

print(df)

输出

                 time  ID                               time_category
0 2021-02-03 20:30:00   1  [2021-02-03 20:30:00, 2021-02-03 20:36:00)
1 2021-02-03 20:33:00   1  [2021-02-03 20:30:00, 2021-02-03 20:36:00)
2 2021-02-03 20:36:00   3  [2021-02-03 20:36:00, 2021-02-03 20:42:00)
3 2021-02-03 20:40:00   3  [2021-02-03 20:36:00, 2021-02-03 20:42:00)
4 2021-02-03 20:42:00   5  [2021-02-03 20:42:00, 2021-02-03 20:48:00)

无日期装箱

还有另一种方法。您提到需要避免在分组中使用日期。不幸的是，我没有使用pandas内部结构来扩展解决方案。但这可以通过另一种方式实现

让我们从00:00到23:54手动创建bins，并为它们分配密钥。然后我们将使用categorize函数将相应的键分配给时间值。注意，这里我创建了new_time列，它利用了time.strptime转换。就是这个专栏，然后我对它进行分类

import itertools
from functools import partial
import time

import pandas as pd

bins = [
    time.strptime(f'{hour}:{minute}', '%H:%M')
    for hour, minute in itertools.product(range(24), range(0, 60, 6))
]

bins_mapping = {
    index: value
    for index, value in enumerate(sorted(bins))
}


def categorize(t, bins_mapping):
    for index, value in bins_mapping.items():
        if value > t:
            break
    return index


df = pd.DataFrame({
    'time': ['20:30', '20:33', '20:36', '20:40', '20:42'],
    'ID': [1, 2, 3, 4, 5],
})

df['new_time'] = df['time'].apply(lambda x: time.strptime(x, '%H:%M'))
df['time_category'] = df['new_time'].apply(
    partial(categorize, bins_mapping=bins_mapping)
)
df['ID'] = df.groupby('time_category')['ID'].transform('first')

print(df)

输出

    time  ID                           new_time  time_category
0  20:30   1  (1900, 1, 1, 20, 30, 0, 0, 1, -1)            206
1  20:33   1  (1900, 1, 1, 20, 33, 0, 0, 1, -1)            206
2  20:36   3  (1900, 1, 1, 20, 36, 0, 0, 1, -1)            207
3  20:40   3  (1900, 1, 1, 20, 40, 0, 0, 1, -1)            207
4  20:42   5  (1900, 1, 1, 20, 42, 0, 0, 1, -1)            208

无日期装箱

相关问题更多 >

编程相关推荐

热门问题

热门文章