如何在使用numpy阵列维护内存的同时提高速度?

2024-10-04 09:30:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要写一个代码来做一个样本t检验,给定一个二维数组中每个条目的样本均值(E(X))和样本第二原始矩(E(X^2))

有两种方法,我这样做,但他们都不太工作

  1. 使用numpy否决操作-对于特定大小的数组,内存不足错误
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter):
    global nsubs
    vt_sum_counter = vt_sum_counter/nsubs
    vt_ssum_counter = vt_ssum_counter/nsubs
    sample_var = nsubs * (vt_ssum_counter - np.square(vt_sum_counter))/(nsubs - 1)
    t_array = np.divide(vt_sum_counter, (np.sqrt(sample_var/nsubs)))
    pvals = t.sf(t_array, nsubs-1)
    pvals[np.isnan(pvals)] = 0
    return pvals
  1. 普通for循环方法-比较起来需要很多时间
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter, tail=1):
    global nsubs
    V, T = vt_sum_counter.shape
    pvals = np.zeros((V, T))
    for i in range(V):
        for j in range(T):
            sigma = ((vt_ssum_counter[i, j]/nsubs -(vt_sum_counter[i,j]/nsubs)**2)/(nsubs - 1))**0.5
            if (sigma != 0):
                pvals[i, j] = t.sf(vt_sum_counter[i, j]/(nsubs*sigma), nsubs-1)
    return pvals

输入阵列非常庞大,通常大小约为900000 X 400


Tags: 方法fordefnpcountercalc数组sigma