建议在我的代码中使用更快的for/if语句？

import numpy as np file = open('input.txt','r'); coordset = set() data = np.zeros((600,4))*np.nan irow = 0 ctr = 0 for row in file: item = row.split() x = float(item[0]) y = float(item[1]) z = float(item[2]) # build unique grid of coords if ((x,y)) not in coordset: data[irow][0] = x data[irow][1] = y data[irow][2] = z irow = irow + 1 # grows up to 599 # lookup table of unique coords coordset.add((x,y)) # BOTTLENECK. replace ifs? for? for i in range(0, irow): if data[i][0]==x and data[i][1]==y: if z > data[i][2]: continue elif z==data[i][2]: ctr = ctr + 1 data[i][3]=ctr if z < data[i][2]: data[i][2] = z ctr = 1 data[i][3]=ctr

3条回答

网友

1楼 · 编辑于 2024-09-30 04:34:37

要在numpy中执行此操作，请使用np.unique。在

def count_unique(arr):
    row_view=np.ascontiguousarray(a).view(np.dtype((np.void,a.dtype.itemsize * a.shape[1])))
    ua, uind = np.unique(row_view,return_inverse=True)
    unique_rows = ua.view(a.dtype).reshape(ua.shape + (-1,))
    count=np.bincount(uind)
    return np.hstack((unique_rows,count[:,None]))

首先让我们检查一个小数组：

^{pr2}$

看起来不错！现在让我们检查一个大数组：

a=np.random.rand(3E7,3)
a=np.around(a,1)

output=count_unique(a)
print output.shape
(1331, 4)  #Close as I can get to 600 unique elements.

print np.sum(output[:,-1])
30000000.0

在我的机器上需要大约33秒的时间和3GB的内存，在内存中为大型阵列执行这些操作可能是您的瓶颈。作为参考，@Joowani的解决方案花费了大约130秒，尽管这有点像苹果和橘子的比较，因为我们从一个numpy数组开始。你的身份可能不同。在

要以numpy数组的形式读入数据，我将查看问题here，但它应该类似于以下内容：

arr=np.genfromtxt("./input.txt", delimiter=" ")

从一个txt文件加载那么多的数据，我真的推荐使用这个链接中的pandas示例。在

网友

2楼 · 编辑于 2024-09-30 04:34:37

您的解决方案看起来很慢，因为它会在每次更新时迭代列表（即数据）。更好的方法是使用字典，它在每次更新时取O（1）而不是O（n）。在

以下是我使用字典的解决方案：

file = open('input.txt', 'r')

#coordinates
c = {}

for line in file:
    #items
    (x, y, z) = (float(n) for n in line.split())

    if (x, y) not in c:
        c[(x, y)] = [z, 1]
    elif c[(x, y)][0] > z:
        c[(x, y)][0], c[(x, y)][1] = z, 1
    elif c[(x, y)][0] == z:
        c[(x, y)][1] += 1

for key in c:
    print("{} {} {} {}".format(key[0], key[1], c[key][0], c[key][1]))

网友

3楼 · 编辑于 2024-09-30 04:34:37

为什么不把最后一个if改成elif呢？在

与现在一样，您将在循环的每个迭代中计算z < data[i][2]:。在

您甚至可以用else替换它，因为您已经选中了if z>data[i][2]和{}，所以剩下的唯一可能就是z < data[i][2]:

因此，下面的代码也可以做到这一点，而且速度应该更快：

        if z > data[i][2]:
            continue
        elif z==data[i][2]:
            ctr = ctr + 1
            data[i][3]=ctr
        else:
            data[i][2] = z
            ctr = 1
            data[i][3]=ctr

相关问题更多 >

编程相关推荐

热门问题

热门文章