我可以用日期索引在Pandas身上创建假人吗?

2024-09-28 01:31:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在搜索是否可以使用索引为pandas中的date创建假人,但是还没有找到任何东西。在

我有一个由date索引的df

                        dew    temp   
date
2010-01-02 00:00:00      129.0  -16     
2010-01-02 01:00:00      148.0  -15     
2010-01-02 02:00:00      159.0  -11     
2010-01-02 03:00:00      181.0   -7      
2010-01-02 04:00:00      138.0   -7   
...  

{cdi>使用一个列

^{pr2}$

然后用这样的东西来制造假人

df['main_hours'] = np.where((df['date'] >= '2010-01-02 03:00:00') & (df['date'] <= '2010-01-02 05:00:00')1,0)

但是,我希望使用索引date动态创建虚拟变量,而不使用date作为列。在pandas中有没有这样的方法? 任何建议都将不胜感激。在


Tags: 方法pandasdfdatemainnpwheretemp
3条回答
df = df.assign(main_hours=0)
df.loc[df.between_time(start_time='3:00', end_time='5:00').index, 'main_hours'] = 1
>>> df
                     dew  temp  main_hours
2010-01-02 00:00:00  129   -16           0
2010-01-02 01:00:00  148   -15           0
2010-01-02 02:00:00  159   -11           0
2010-01-02 03:00:00  181    -7           1
2010-01-02 04:00:00  138    -7           1

IIUC:

df['main_hours'] = \
    np.where((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'),
             1,
             0)

或者:

^{pr2}$

计时:对于50000行数据流:

In [19]: df = pd.concat([df.reset_index()] * 10**4, ignore_index=True).set_index('date')

In [20]: pd.options.display.max_rows = 10

In [21]: df
Out[21]:
                       dew  temp
date
2010-01-02 00:00:00  129.0   -16
2010-01-02 01:00:00  148.0   -15
2010-01-02 02:00:00  159.0   -11
2010-01-02 03:00:00  181.0    -7
2010-01-02 04:00:00  138.0    -7
...                    ...   ...
2010-01-02 00:00:00  129.0   -16
2010-01-02 01:00:00  148.0   -15
2010-01-02 02:00:00  159.0   -11
2010-01-02 03:00:00  181.0    -7
2010-01-02 04:00:00  138.0    -7

[50000 rows x 2 columns]

In [22]: %timeit ((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00')).astype(int)
1.58 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [23]: %timeit np.where((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 1, 0)
1.52 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [24]: df.shape
Out[24]: (50000, 2)

或者使用between

pd.Series(df.index).between('2010-01-02 03:00:00',  '2010-01-02 05:00:00', inclusive=True).astype(int)

Out[1567]: 
0    0
1    0
2    0
3    1
4    1
Name: date, dtype: int32

相关问题 更多 >

    热门问题