用重叠索引堆叠数组。在循环上寻找矢量化的方法

2024-06-25 05:18:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找一种向量化的方法来循环数组索引,以将它们垂直堆叠在具有重叠索引的组中。
给出我想要达到的目标的要点:

给定一个列表[1,2,3,4,5,6],一个值为2的区间变量,一个值为1的重叠变量。 输出应该是这样的:[[1,2],[2,3],[3,4],[4,5],[5,6]

然而,我拥有的数据是1560xx87236的形状,其中1560是受试者,2x87236是x,y轨迹。因此,对于每个主题,我有87236个x点和87326个y点。 通过转换来维护表示xs和ys的维度2是至关重要的


要简化表示,请执行以下操作:

假设我有一个数组:

arr

array([[[35, 33, 34, 42, 32, 30],
        [22, 38, 29, 33, 25, 14]],
       [[17, 25, 39, 17, 41, 22],
        [22, 13, 14, 31, 20, 38]],
       [[30, 10, 33, 25, 38, 26],
        [28, 27, 19, 27, 43, 13]]])

arr.shape

(3, 2, 6)

我要做的是将这个数组按3个索引重叠(重叠1个索引)的组或间隔堆叠。 输出如下所示:

stacked_arr

array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[35., 33., 34.],
        [22., 38., 29.]],

       [[34., 42., 32.],
        [29., 33., 25.]],

       [[17., 25., 39.],
        [22., 13., 14.]],

       [[39., 17., 41.],
        [14., 31., 20.]],

       [[30., 10., 33.],
        [28., 27., 19.]],

       [[33., 25., 38.],
        [19., 27., 43.]]])

stacked_arr.shape

(7, 2, 3)

这是我编写的实现上述结果的函数:

def overlap_stack(data, padwith, interv, overlapby):
    sub = 0

    # Initialise: 1 bcuz for a sub, 2 bcuz of x,y
    stacked = cp.zeros(shape=(1, 2, interv))
    while sub < data.shape[0]:
        idx: int
        for idx in range(0, data.shape[2], interv - overlapby):

            # grouping with overlaps
            stack = cp.expand_dims(data[sub, :, idx: idx + interv], axis=0)

            # pad to cope with unequal length
            if (stack.shape[2]) < interv:
                stack = cp.pad(stack, ((0, 0), (0, 0), (0, interv - stack.shape[2])), 'constant',
                               constant_values=padwith)

            # stacking all together
            stacked = cp.vstack((stacked, stack))


        sub += 1
    return stacked

转换1560x2x87236阵列需要8到10个小时。如果您能以任何方式帮助我加快这一进程,我将不胜感激


Tags: fordatastackwith数组arraycparr
1条回答
网友
1楼 · 发布于 2024-06-25 05:18:21

我不知道您是否熟悉numpy.lib.stride_tricks.as_strided,但这里有一个使用它的解决方案:

import numpy as np
from numpy.lib.stride_tricks import as_strided

def overlap_stack(data, interv, overlapby):
    A = np.vstack(data)

    window_size = (data.shape[1], interv)
    strides = (window_size[0], interv - overlapby)

    output_strides = (strides[0]*A.strides[0], strides[1]*A.strides[1]) + A.strides

    output_shape = ((A.shape[0] - window_size[0])//strides[0] + 1,
                    (A.shape[1] - window_size[1])//strides[1] + 1) + window_size

    return as_strided(A, shape=output_shape, strides=output_strides).reshape(-1, *output_shape[2:])

我忽略了填充,因为我不确定您要如何添加(不过您可以自己添加)

例如:

data = np.array([[[35, 33, 34, 42, 32, 30],
                  [22, 38, 29, 33, 25, 14]],
                 [[17, 25, 39, 17, 41, 22],
                  [22, 13, 14, 31, 20, 38]],
                 [[30, 10, 33, 25, 38, 26],
                  [28, 27, 19, 27, 43, 13]]])

overlap_stack(data, 3, 1)

array([[[35, 33, 34],
        [22, 38, 29]],

       [[34, 42, 32],
        [29, 33, 25]],

       [[17, 25, 39],
        [22, 13, 14]],

       [[39, 17, 41],
        [14, 31, 20]],

       [[30, 10, 33],
        [28, 27, 19]],

       [[33, 25, 38],
        [19, 27, 43]]])

请注意,对于shape(1560, 2, 87236)的数组,这将非常快,但需要大量内存

相关问题 更多 >