加速引用另一个数据帧的进程

def sumvals(x) S = (D['value'].loc[x.index] >= self.index_median.loc[x.index[-1]]) return sum(S*(x-self.index_median.loc[x.index[-1]])) D['value'].rolling(lookback).apply(sumvals)

3条回答

网友

1楼 · 编辑于 2024-10-02 12:34:56

def sumvals(x)
      m =  self.index_median.loc[x.index[-1]]
      condition = (x.index >= m)
      return sum(x[condition]-m)

   D['value'].rolling(lookback).apply(sumvals)

当我们计算回望窗口中所有值项的总和时，不需要将它们与self.index进行比较。同样根据您的描述，如果您采用D中的值行，则您可以

return sum(x[condition])

而是直接

另一个解决方案是，您可以将整个操作转换为numpy，以加快滚动操作。为此签出numpy_ext包

网友

2楼 · 编辑于 2024-10-02 12:34:56

从您的示例数据开始：

df = pd.DataFrame()
df['I'] = pd.Series([1,-2,8,-10,3,4,5, 10, -20, 3])
df['I_median'] = df['I'].rolling(lookback).median()
df['Values'] = pd.Series([1,2,2,3,0,9,10, 8, 20, 9])

现在为“Value”列添加移位列

# add one column for every lookback    
for colno in range(lookback):  

        # shift the column by one and deduct the median
        df['n'+ str(colno)] = df['Values'].shift(colno) - df['I_median']

        # remove all negative numbers (where value is smaller than median)
        df['n'+ str(colno)] = df['n'+ str(colno)].where(df['n'+ str(colno)]> 0, 0)

# sum up across the new columns
df['result'] = df[df.columns[-lookback:]].sum(axis=1)

result包含您的结果并等于

0     0.0
1     0.0
2     2.0
3    13.0
4     0.0
5     6.0
6    11.0
7    12.0
8    23.0
9    28.0
Name: result, dtype: float64

编辑：数据框中不带移位列

df['result'] = 0

for colno in range(lookback):  
        # shift the column by one and deduct the median
        df['temp'] = df['Values'].shift(colno) - df['I_median']

        # remove all negative numbers (where value is smaller than median)
        df['temp'] = df['temp'].where(df['temp']> 0, 0)

        # sum up across the new columns
        df['result'] = df['result'] + df['temp']

演出

数据帧中有1m行
1000回望

lookback = 1000
df = pd.DataFrame()
df['I'] = pd.Series(np.random.randint(0, 10, size=1000000))
df['I_median'] = df['I'].rolling(lookback).median()
df['Values'] = pd.Series(np.random.randint(0, 10, size=1000000))

大约14秒

网友

3楼 · 编辑于 2024-10-02 12:34:56

.loc速度慢，应用速度慢。在我看来，使用向量化函数和列上的操作，无需逐行应用和loc查找，就可以实现所需的功能

如果没有@Manakin建议的真实数据示例，很难判断。但是我试着用一个例子来重新创建你的问题，并根据你的描述来解决它

# lookback window    
lookback = 3 

# Fixed Index
I = [5, 2, 1, 4, 2, 4, 1, 2, 1, 10]

# Dataframe with value column, Index added as column for convenience
df = pd.DataFrame({'I': I, 
                   'value':[6,5,4,3,2,1, 2, 3, 4, 5]},
                   index=I)

# Median over lookback window
df['I_median'] = df.I.rolling(lookback).median()

屈服

|    |  I    |  value   | I_median
|  |   -|     |     |
| 5  | 5     | 6        | NaN      |
| 2  | 2     | 5        | NaN      |
| 1  | 1     | 4        | 2.0      |
| 4  | 4     | 3        | 2.0      |
| 2  | 2     | 2        | 2.0      |
| 4  | 4     | 1        | 4.0      |
| 1  | 1     | 2        | 2.0      |
| 2  | 2     | 3        | 2.0      |
| 1  | 1     | 4        | 1.0      |
| 10 | 10    | 5        | 2.0      |

# Check if Index is greater than median
df['I_gt'] = df.I > df.I_median

# set all rows to 0 where median is greater than index
df['filtered_val'] = df.value.where(df.I_gt, 0)

|    | I  | value | I_median | I_gt  | filtered_val |
|  |  |   -|     |   -|       |
| 5  | 5  | 6     | NaN      | False | 0            |
| 2  | 2  | 5     | NaN      | False | 0            |
| 1  | 1  | 4     | 2.0      | False | 0            |
| 4  | 4  | 3     | 2.0      | True  | 3            |
| 2  | 2  | 2     | 2.0      | False | 0            |
| 4  | 4  | 1     | 4.0      | False | 0            |
| 1  | 1  | 2     | 2.0      | False | 0            |
| 2  | 2  | 3     | 2.0      | False | 0            |
| 1  | 1  | 4     | 1.0      | False | 0            |
| 10 | 10 | 5     | 2.0      | True  | 5            |

然后简单地对过滤后的列进行滚动求和

df.filtered_val.rolling(lookback).sum()

编辑：数据框中不带移位列

演出

相关问题更多 >

编程相关推荐

热门问题

热门文章