滚动窗口或滚动窗口适用于Pandas0.20.3

2024-09-27 23:22:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用自定义数组来加权时间序列中的值/数据帧像在How do I calculate a rolling mean with custom weights in pandas?里一样

import pandas as pd

ser = pd.Series([1,1,1], index=pd.date_range('1/1/2000', periods=3))
print ser

rm1 = pd.rolling_window(ser, window=[2,2,2], mean=False)
rm2 = pd.rolling_window(ser, window=[2,2,2]) #, mean=True

print rm1
#
#2000-01-01   NaN
#2000-01-02   NaN
#2000-01-03     6
#Freq: D, dtype: float64
print rm2
#
#2000-01-01   NaN
#2000-01-02   NaN
#2000-01-03     1
#Freq: D, dtype: float64

但熊猫0.20.3似乎已经不存在这种情况了。我该怎么做?在

就目前情况来看,我得到了错误

ValueError: window must be an integer


Tags: pandas情况数组nanwindowmeanserpd
2条回答

我特别感兴趣的是具有半高斯函数的衰老。所以这似乎很管用:

from scipy.stats import norm
import math

def half_gaussian_convolution(input):
    normal_weighting = norm.pdf(np.array(range(-len(input) + 1, 1)), scale=(len(input) - 1) / 1.6448536269514722)
    normal_weighting = normal_weighting / np.sum(normal_weighting)
    return np.sum(normal_weighting * input)

ser.rolling(window=4, center=False).apply(func=half_gaussian_convolution)

我想不出仅仅使用新的rolling方法的简单解决方案。似乎唯一的方法是创建一个dataframe并用加权值创建一个新列。在

>>> df = pd.DataFrame([1,1,1], index=pd.date_range('1/1/2000', periods=3), columns=['value'])
>>> df['weight'] = [2, 2, 2]
>>> df['weighted'] = df['value'] * df['weight']
>>> df
            value  weight  weighted
2000-01-01      1       2         2
2000-01-02      1       2         2
2000-01-03      1       2         2

计算总数很简单。创建数据帧后,使用rolling方法和sum。使用您提供的示例,窗口的大小看起来是3。在

^{pr2}$

然而,计算加权平均值需要生成另一列来计算加权平均值,在该列中取加权列中的值并除以weight列中的值。这样可以确保你计算的是加权平均值,而不是加权值的平均值。。。这里差别很大。在

>>> df_rolled['w_mean'] = df_rolled['weighted'] / df_rolled['weight']
>>> df_rolled['w_mean']
2000-01-01    NaN
2000-01-02    NaN
2000-01-03    1.0
Freq: D, Name: w_mean, dtype: float64

另一个检查解决方案是否有效的示例:

>>> df['value'] = [2, 4, 6]
>>> df['weight'] = [1, 3, 5]
>>> df['weighted'] = df['value'] * df['weight']
>>> df
            value  weight  weighted
2000-01-01      2       1         2
2000-01-02      4       3        12
2000-01-03      6       5        30
>>> df_rolled = df.rolling(3).sum()
>>> df_rolled['weighted']  # weighted sum
2000-01-01     NaN
2000-01-02     NaN
2000-01-03    44.0
Freq: D, Name: weighted, dtype: float64
>>> df_rolled['w_mean'] = df['weighted'] / df['weight']
>>> df_rolled['w_mean']  # weighted mean
2000-01-01         NaN
2000-01-02         NaN
2000-01-03    4.888889
Freq: D, Name: w_mean, dtype: float64
>>> df_rolled = df.rolling(2).sum()  # window size 2
>>> df_rolled['weighted']
2000-01-01     NaN
2000-01-02    14.0
2000-01-03    42.0
Freq: D, Name: weighted, dtype: float64
>>> df_rolled['w_mean'] = df_rolled['weighted'] / df_rolled['weight']
>>> df_rolled['w_mean']
2000-01-01     NaN
2000-01-02    3.50
2000-01-03    5.25
Freq: D, Name: w_mean, dtype: float64

相关问题 更多 >

    热门问题