有没有办法优化我的列表理解以获得更好的性能?它比for循环慢

2024-09-22 16:38:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图优化我的代码,以便在ASC光栅文件中循环。该函数的输入是来自ASC文件的数据数组,其形状为1.000 x 1.000(1mio数据点)、ASC文件信息和一个列跳过值。在这种情况下,跳过值并不重要

我的函数带有for循环代码,如果data==nodata\u值,则执行得很好并跳过一个数组单元格。以下是函数:

def asc_process_single(self, asc_array, asc_info, skip=1):
    # ncols = asc_info['ncols']
    nrows = asc_info['nrows']
    xllcornor = asc_info['xllcornor']
    yllcornor = asc_info['yllcornor']
    cellsize = asc_info['cellsize']
    nodata_value = asc_info['nodata_value']

    raster_size_y = cellsize*nrows
    # raster_size_x = cellsize*ncols

    # Looping over array rows and cols with skipping
    xyz = []
    for row in range(asc_array.shape[0])[::skip]:
        for col in range(asc_array.shape[1])[::skip]:
            val_z = asc_array[row, col]  # Z value of datapoint

            # The no data value is not processed
            if val_z == nodata_value:
                pass
            else:
                # Xcoordinate for current Z value
                val_x = xllcornor + (col * cellsize)

                # Ycoordinate for current Z value
                val_y = yllcornor + raster_size_y - (row * cellsize)

                # x, y, z to LIST
                xyz.append([val_x, val_y, val_z])
    return xyz

在存在nodata_值的ASC文件上重复7次的计时为:

593 ms ± 34.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

我想通过列表理解我可以做得更好:

def asc_process_single_listcomprehension(self, asc_array, asc_info, skip=1):
        # ncols = asc_info['ncols']
        nrows = asc_info['nrows']
        xllcornor = asc_info['xllcornor']
        yllcornor = asc_info['yllcornor']
        cellsize = asc_info['cellsize']
        nodata_value = asc_info['nodata_value']

        raster_size_y = cellsize*nrows
        # raster_size_x = cellsize*ncols

        # Looping over array rows and cols with skipping
        rows = range(asc_array.shape[0])[::skip]
        cols = range(asc_array.shape[1])[::skip]
        
        xyz = [[xllcornor + (col * cellsize),
               yllcornor + raster_size_y - (row * cellsize),
               asc_array[row, col]]
               for row in rows for col in cols
               if asc_array[row, col] != nodata_value] 
        
        return xyz

然而,这比我的for循环执行得慢,我想知道为什么

757 ms ± 58.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

是因为列表理解查找asc_数组[row,col]两次吗?光是这项手术就要花很多钱

193 ns ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

而不是仅使用我的for循环中数组中已存在的查找值中的z值进行赋值

51.2 ns ± 1.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

这样做100万次,就可以把列表理解所花费的时间加起来。 如何进一步优化我的列表理解,使其比for循环性能更好?还有其他提高性能的方法吗

编辑: 解决方案: 我尝试了给出的两个建议

  1. 在我的列表理解中引用我的Z值,不要这样做 在数组中查找两次,这需要更长的时间
  2. 重新编写函数以处理numpy阵列的问题

我将列表改写为:

xyz = [[xllcornor + (col * cellsize),
               yllcornor + raster_size_y - (row * cellsize),
               val_z]
               for row in rows for col in cols for val_z in 
[asc_array[row, col]]
               if val_z != nodata_value]

numpy函数变成了这样:

def asc_process_numpy_single(self, asc_array, asc_info, skip):
    # ncols = asc_info['ncols']
    nrows = asc_info['nrows']
    xllcornor = asc_info['xllcornor']
    yllcornor = asc_info['yllcornor']
    cellsize = asc_info['cellsize']
    nodata_value = asc_info['nodata_value']

    raster_size_y = cellsize*nrows
    # raster_size_x = cellsize*ncols

    rows = np.arange(0,asc_array.shape[0],skip)[:,np.newaxis]
    cols = np.arange(0,asc_array.shape[1],skip)

    x = np.zeros((len(rows),len(cols))) + xllcornor + (cols * cellsize)
    y = np.zeros((len(rows),len(cols))) + yllcornor + raster_size_y - (rows * 
    cellsize)
    z = asc_array[::skip,::skip]

    xyz = np.asarray([x,y,z]).T.transpose((1,0,2)).reshape( 
    (int(len(rows)*len(cols)), 3) )
    mask = (xyz[:,2] != nodata_value)
    xyz = xyz[mask]
    return xyz

我在numpy函数的最后两行添加了掩码,因为我不想要nodata_值。 演出顺序如下:;对于循环、列表理解、列表理解建议和numpy函数建议:

609 ms ± 44.8 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
706 ms ± 22 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
604 ms ± 21.5 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
70.4 ms ± 1.26 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

列表理解在优化时与for循环相比,但numpy函数以9倍的速度加快了参与方的速度

非常感谢您的评论和建议。我今天学到了很多


Tags: infoloopforvaluecolarraymsrow
2条回答

我能想象到的唯一一件让您慢下来的事情是,在原始代码中,您将asc_array[row, col]放入一个临时变量,而在列表理解中,您对它求值两次

您可能想尝试两件事:

  1. 使用walrus运算符在“if”语句中为val_z赋值,或

  2. 在另外两个for之后添加for val_z in [asc_array[row, col]]

祝你好运

是的,两次评估阵列会增加计算时间。以下是我的测试用例:

def funLoop(A):
    xyz = []
    for row in range(A.shape[0]):
        for col in range(A.shape[1]):
            xyz.append([col, row, A[row, col] ])
            
def funListComp1(A):
    xyz = [ [col, row, A[row, col] ] 
           for row in range(A.shape[0]) for col in range(A.shape[1])]

def funListComp2(A):
    xyz = [ [col, A[row, col], A[row, col] ] 
           for row in range(A.shape[0]) for col in range(A.shape[1])]
    
A = np.random.rand(1000,1000)
%timeit funLoop(A)
%timeit funListComp1(A)
%timeit funListComp2(A)
457 ms ± 70.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
378 ms ± 8.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
779 ms ± 309 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

对于大数据,对于循环,您应该始终更喜欢使用numpy而不是python。在您的情况下,numpy代码看起来有点像:

def asc_process_single_numpy(asc_array):
    nodata_value = np.nan
    raster_size_y = 1
    skip = 2
    xllcornor = 0
    yllcornor = 0
    cellsize  = 1
    rows = np.arange(0,asc_array.shape[0],skip)[:,np.newaxis]
    cols = np.arange(0,asc_array.shape[1],skip)

    #for row in rows for col in cols
    x = np.zeros((len(rows),len(cols))) + xllcornor + (cols * cellsize)
    y = np.zeros((len(rows),len(cols))) + yllcornor + raster_size_y - (rows * cellsize)
    z = asc_array[::skip,::skip]
    return np.asarray([x,y,z]).T.transpose((1,0,2)).reshape( (int(len(rows)*len(cols)), 3) )

A = np.random.rand(1000,1000)
%timeit asc_process_single(A)
%timeit asc_process_single_listcomprehension(A)
%timeit asc_process_single_numpy(A)
183 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
210 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
11.3 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

相关问题 更多 >