Python中单个数组上更快的双迭代

def pairwise(agree, disagree): return(agree/(agree+disagree)) def pairwise_computing_array(df): humanScores = np.array(df['Judgement']) pagerankScores = np.array(df['PR_Score']) total = 0 agree = 0 disagree = 0 for i in range(len(df)-1): for j in range(i+1, len(df)): total += 1 human = humanScores[i] - humanScores[j] #difference human judg if human != 0: pr = pagerankScores[i] - pagerankScores[j]#difference pagerank score if pr != 0: if np.sign(human) == np.sign(pr): agree += 1 #they agree in which of the two is better else: disagree +=1 #they do not agree in which of the two is better else: continue; else: continue; pairwise_accuracy = pairwise(agree, disagree) return(agree, disagree, total, pairwise_accuracy)

3条回答

网友

1楼 · 编辑于 2024-10-03 19:28:29

这是一个在合理时间内有效的代码，由于@胡安帕.阿里维拉加建议：

from numba import jit

@jit(nopython = True)
def pairwise_computing(humanScores, pagerankScores):

    total = 0 
    agree = 0
    disagree = 0

    for i in range(len(humanScores)-1):  
        for j in range(i+1, len(humanScores)):
            total += 1
            human = humanScores[i] -  humanScores[j] #difference human judg
            if human != 0:
                pr = pagerankScores[i] -  pagerankScores[j]#difference pagerank score
                if pr != 0:
                    if np.sign(human) == np.sign(pr):  
                        agree += 1 #they agree in which of the two is better
                    else:
                        disagree +=1 #they do not agree in which of the two is better
                else:
                    continue   
            else:
                continue
    pairwise_accuracy = agree/(agree+disagree)
    return(agree, disagree, total,  pairwise_accuracy)

这是我的整个数据集（58k行）达到的时间性能：

每个回路7.98 s±2.78 ms（7次运行的平均值±标准偏差，每个回路1次）

网友

2楼 · 编辑于 2024-10-03 19:28:29

通过利用广播可以消除内部的for循环，因为索引j总是比索引i提前1倍（即，我们不回头看）。但在计算一致性/不一致性时有一个小小的问题：

if np.sign(human) == np.sign(pr):

我不知道该怎么解决。所以，我只是在这里提供了框架代码，以便进行更多的调整并使其工作，因为您更了解问题所在。接下来是：

^{pr2}$

网友

3楼 · 编辑于 2024-10-03 19:28:29

你有numpy数组，为什么不直接使用它呢？您可以将工作从Python卸载到C编译代码（通常，但不总是）：

首先，将向量调整为1xN矩阵：

humanScores = np.array(df['Judgement']).resize((1,-1))
pagerankScores =  np.array(judgmentPR['PR_Score']).resize((1,-1))

然后找出差异，我们只对标志感兴趣：

^{pr2}$

这里我假设数据是整数，因此clip函数将只产生-1、0或1。然后你可以数一数：

agree = ((humanDiff != 0) & (pagerankDiff != 0) & (humanDiff == pagerankDiff)).sum()
disagree = ((humanDiff != 0) & (pagerankDiff != 0) & (humanDiff != pagerankDiff)).sum()

但是上面的计数是重复计算的，因为项（i，j）和项（j，i）在humanDiff和{}中都是完全相反的符号。您可以考虑只取方阵的上三角部分求和：

agree = ((humanDiff != 0) &
         (pagerankDiff != 0) &
         (np.triu(humanDiff) == np.triu(pagerankDiff))
        ).sum()

相关问题更多 >

编程相关推荐

热门问题

热门文章