如何一次透视一个时间序列列的N个观测值

date 2018-02-28 09:00:00 78700.0 2018-02-28 10:00:00 78900.0 2018-02-28 11:00:00 78100.0 2018-02-28 12:00:00 78100.0 2018-02-28 13:00:00 77500.0 ... 2018-11-30 11:00:00 70000.0 2018-11-30 12:00:00 69800.0 2018-11-30 13:00:00 69800.0 2018-11-30 14:00:00 69600.0 2018-11-30 15:00:00 69400.0

date 0 1 2 3 4 5 2018-02-28 09:00:00 78700.0 78900.0 78100.0 78100.0 77500.0 77100.0 2018-02-28 10:00:00 78900.0 78100.0 78100.0 77500.0 77100.0 77100.0 2018-02-28 11:00:00 78100.0 78100.0 77500.0 77100.0 77100.0 76300.0 2018-02-28 12:00:00 78100.0 77500.0 77100.0 77100.0 76300.0 76200.0 2018-02-28 13:00:00 77500.0 77100.0 77100.0 76300.0 76200.0 76700.0 ... ... ... ... ... ... ... 2018-11-29 12:00:00 72000.0 72000.0 71800.0 71500.0 71500.0 70000.0 2018-11-29 13:00:00 72000.0 71800.0 71500.0 71500.0 70000.0 70000.0 2018-11-29 14:00:00 71800.0 71500.0 71500.0 70000.0 70000.0 69800.0 2018-11-29 15:00:00 71500.0 71500.0 70000.0 70000.0 69800.0 69800.0 2018-11-30 09:00:00 71500.0 70000.0 70000.0 69800.0 69800.0 69600.0

1条回答

网友

1楼 · 发布于 2024-06-23 03:42:49

有一种方法可以使用Hankel matrix和一些数组操作来实现所需的输出。可以使用^{}函数构造Hankel矩阵

在下面的代码中，我定义了一个自定义函数time_series_to_hankel()，它将数据帧、要堆叠在一行中的时间序列变量以及时间步数作为输入

import numpy as np
import pandas as pd
from scipy.linalg import hankel

def time_series_to_hankel(data, ts_col, n_steps):
    
    # generate hankel dataframe for the time series column
    h = hankel(data[ts_col])[:(data.shape[0] - n_steps + 1), :n_steps]
    h_df = pd.DataFrame(h, columns=['t_' + str(i) for i in range(h.shape[1])])
    
    # manipulate the original df
    temp_df = data.drop(columns=['value']).loc[:(h.shape[0] - 1)]
    
    # concat the two dataframes
    return pd.concat([temp_df, h_df], axis=1)

如果你想理解所有段落中的基本原理，我建议你一步一步地运行它

范例

import numpy as np
import pandas as pd
from scipy.linalg import hankel

# similar to your sample dataset
df = pd.DataFrame({
    'date': pd.date_range('2018-02-28 09:00:00', '2018-11-30 15:00:00', freq='H'),
    'test_var': np.random.randint(1, 10, size=6607),
    'value': np.linspace(78700, 69400, num=6607).astype(int)
})

time_series_to_hankel(df, 'value', n_steps=6)
                    date  test_var    t_0    t_1    t_2    t_3    t_4    t_5
0    2018-02-28 09:00:00         7  78700  78698  78697  78695  78694  78692
1    2018-02-28 10:00:00         9  78698  78697  78695  78694  78692  78691
2    2018-02-28 11:00:00         2  78697  78695  78694  78692  78691  78690
3    2018-02-28 12:00:00         8  78695  78694  78692  78691  78690  78688
4    2018-02-28 13:00:00         1  78694  78692  78691  78690  78688  78687
...                  ...       ...    ...    ...    ...    ...    ...    ...
6597 2018-11-30 06:00:00         8  69412  69411  69409  69408  69407  69405
6598 2018-11-30 07:00:00         4  69411  69409  69408  69407  69405  69404
6599 2018-11-30 08:00:00         3  69409  69408  69407  69405  69404  69402
6600 2018-11-30 09:00:00         6  69408  69407  69405  69404  69402  69401
6601 2018-11-30 10:00:00         4  69407  69405  69404  69402  69401  69400

相关问题更多 >

编程相关推荐

热门问题

热门文章