根据现有日期列中的月份和季节添加二进制分类列

2024-10-03 23:25:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,日期如下:

print(data)

          date      time  
0   2017-01-10  00:00:00        
1   2017-01-17  00:00:00        
2   2017-01-24  00:00:00        
3   2017-01-31  00:00:00        
4   2017-02-07  00:00:00        
..         ...       ...   
220 2021-04-27  00:00:00   
221 2021-05-03  00:00:00   
222 2021-05-10  00:00:00   
223 2021-05-17  00:00:00   
224 2021-05-25  00:00:00   

如何添加季节列winterspringsummerfall和月份列january->december使用二进制编码,因此我的数据帧如下所示:

print(data)

          date      time  winter  spring  summer  fall  january  february  etc.
0   2017-01-10  00:00:00       1       0       0     0        1         0   ...
1   2017-01-17  00:00:00       1       0       0     0        1         0   ...
2   2017-01-24  00:00:00       1       0       0     0        1         0   ...
3   2017-01-31  00:00:00       1       0       0     0        1         0   ...
4   2017-02-07  00:00:00       1       0       0     0        0         1   ...
..         ...       ...     ...     ...     ...   ...      ...       ...   ...
220 2021-04-27  00:00:00       0       1       0     0        0         0   ...
221 2021-05-03  00:00:00       0       1       0     0        0         0   ...
222 2021-05-10  00:00:00       0       1       0     0        0         0   ...
223 2021-05-17  00:00:00       0       1       0     0        0         0   ...
224 2021-05-25  00:00:00       0       1       0     0        0         0   ...

Tags: 数据编码datadatetime二进制printsummer
1条回答
网友
1楼 · 发布于 2024-10-03 23:25:20

尝试:

  1. 通过^{}calendar将日期转换为月份名称

  2. 通过对月数的一些数学操作将月转换为季节,并转换为^{}

  3. 然后对新列调用^{}

import calendar

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2017-01-10', '2021-05-25', freq='MS')
})

df['month'] = pd.Categorical.from_codes(
    df['date'].dt.month - 1,
    categories=list(calendar.month_name),
    ordered=True
)

df['season'] = pd.Categorical.from_codes(
    df['date'].dt.month % 12 // 3,
    categories=['winter', 'spring', 'summer', 'fall'],
    ordered=True
)

df = pd.get_dummies(df, columns=['season', 'month'], prefix_sep='', prefix='')

带有分类的示例输出:

        date  winter  spring  summer  ...  September  October  November  December
0 2017-02-01       1       0       0  ...          0        0         0         0
1 2017-03-01       0       1       0  ...          0        0         0         0
2 2017-04-01       0       1       0  ...          0        0         0         0
3 2017-05-01       0       1       0  ...          0        0         0         0
4 2017-06-01       0       0       1  ...          0        0         0         0
5 2017-07-01       0       0       1  ...          0        0         0         0
6 2017-08-01       0       0       1  ...          0        0         0         0
7 2017-09-01       0       0       0  ...          0        0         0         0
8 2017-10-01       0       0       0  ...          1        0         0         0
9 2017-11-01       0       0       0  ...          0        1         0         0

“分类”的好处是,假人将以正确的顺序出现,而不是像以下字符串那样出现:

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2017-01-10', '2021-05-25', freq='MS')
})

df['month'] = df['date'].dt.strftime('%B')

df['season'] = (
        df['date'].dt.month % 12 // 3
).replace({0: 'winter', 1: 'spring', 2: 'summer', 3: 'fall'})

df = pd.get_dummies(df, columns=['season', 'month'], prefix_sep='', prefix='')

带有replacedt.strftime的样本输出(注意季节和月份按字母顺序排列):

        date  fall  spring  summer  ...  May  November  October  September
0 2017-02-01     0       0       0  ...    0         0        0          0
1 2017-03-01     0       1       0  ...    0         0        0          0
2 2017-04-01     0       1       0  ...    0         0        0          0
3 2017-05-01     0       1       0  ...    1         0        0          0
4 2017-06-01     0       0       1  ...    0         0        0          0
5 2017-07-01     0       0       1  ...    0         0        0          0
6 2017-08-01     0       0       1  ...    0         0        0          0
7 2017-09-01     1       0       0  ...    0         0        0          1
8 2017-10-01     1       0       0  ...    0         0        1          0
9 2017-11-01     1       0       0  ...    0         1        0          0

相关问题 更多 >