使用cython并行化python循环的numpy.searchsorted

try: from setuptools import setup from setuptools import Extension except ImportError: from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize import numpy extensions = [Extension("matchOnDistanceVectors", sources=["matchOnDistanceVectors.pyx"], extra_compile_args=["/openmp", "/O2"], extra_link_args=[] )] setup( ext_modules = cythonize(extensions), include_dirs=[numpy.get_include()] )

2条回答

网友

1楼 · 编辑于 2024-10-01 02:19:05

你有点捉襟见肘。您需要GIL来调用numpy.searchsorted，但是GIL阻止任何类型的并行处理。最好的办法是编写自己的nogil版本searchsorted：

cdef mySearchSorted(double[:] array, double target) nogil:
    # binary search implementation

for j in prange(0,N, nogil=True):
    for k in range(A2.shape[0]):
        for L in range(A2.shape[1]):
            C[j, k, L]  = mySearchSorted(A1[j, :], A2[k, L])

numpy.searchsorted也有非常多的开销，因此如果N很大，那么使用您自己的searchsorted来减少开销是有意义的。在

网友

2楼 · 编辑于 2024-10-01 02:19:05

我相信searchsorted实现了GIL本身（参见https://github.com/numpy/numpy/blob/e2805398f9a63b825f4a2aab22e9f169ff65aae9/numpy/core/src/multiarray/item_selection.c，第1664行“NPY_BEGIN_THREADS_DEF”）。在

所以，你可以

for j in prange(0,N, nogil=True):
    with gil:
      C[j,:,:]  = np.searchsorted(A1[j,:], A2, side='left' )

这暂时要求GIL在Python对象上做必要的工作（希望速度很快），然后它应该在searchsorted内再次发布，允许大量并行运行。在

为了更新，我做了一个快速的测试（A1.shape==(105,100)，A2.shape==(302,302)，数字是任意选择的）。对于10次重复，串行版本需要4.5秒，并行版本需要1.4秒（测试在4核CPU上运行）。你不能得到4倍全速，但你接近了。在

它被编译为described in the documentation。我怀疑如果你没有看到加速，那么它可能是：1）你的数组足够小，以至于函数调用/numpy检查类型和大小的开销占主导地位；2）你没有在启用OpenMP的情况下编译它；或者3）你的编译器不支持OpenMP。在

相关问题更多 >

编程相关推荐

热门问题

热门文章