如何从主线程中正确地释放pythoncapi-GIL

2024-06-28 20:05:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在一个C++多线程程序中嵌入Python。 我要做的是从pythoncapi调用两个统计函数,对我收集的一些数据执行Two Sample Kolmogorov-Smirnov Test和{}。所以我只是在代码中嵌入Python,而不是扩展它或使用我自己的Python函数。在

我最近发现,为了运行一个使用pythoncapi的多线程程序,您需要正确地处理全局解释器锁(GIL),当您使用pythoncapi函数时,您需要获取GIL,然后在使用完API函数后释放它。在

我仍然不明白的是如何正确地从主线程中释放GIL,以便让其他线程执行Python代码。在

我试过了(选项1):

 int main(int argc, const char * argv[]) {

    int n = 4;
    std::thread threads[n];

    Py_Initialize();
    PyEval_InitThreads();
    PyEval_SaveThread();
    for (int i = 0; i < n; i++) {
        threads[i] = std::thread(exec, i);
    }
    for (int i = 0; i < n; i++) {
        threads[i].join();
    }
    Py_Finalize();
    return 0;
}

但是当调用Py_Finalize()时,它给了我一个segmentation fault。在

所以我试了一下(选项2):

^{pr2}$

这个(选项3):

int main(int argc, const char * argv[]) {

    int n = 4;
    std::thread threads[n];

    Py_Initialize();
    PyEval_InitThreads();
    Py_BEGIN_ALLOW_THREADS
    for (int i = 0; i < n; i++) {
        threads[i] = std::thread(exec, i);
    }
    for (int i = 0; i < n; i++) {
        threads[i].join();
    }
    Py_END_ALLOW_THREADS
    Py_Finalize();
    return 0;
}

对于最后两个选项,代码将运行,但以以下错误结束:

Exception ignored in: <module 'threading' from '/usr/local/opt/python3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py'> Traceback (most recent call last): File "/usr/local/opt/python3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1289, in _shutdown assert tlock.locked() AssertionError:

编辑: 由以下代码生成的线程:

double limited_rand(double lower_bound, double upper_bound) {
    return lower_bound + (rand() / (RAND_MAX / (upper_bound-lower_bound) ) );
}


double exec_1(std::vector<int> &left_sample, std::vector<int> &right_sample) {
    PyGILState_STATE gstate = PyGILState_Ensure(); // Acquiring GIL for thread-safe usage Python C API

    PyObject* scipy_stats_module = PyImport_ImportModule("scipy.stats"); // importing "scipy.stats" module

    import_array();
    npy_intp left_nparray_shape[] = {(npy_intp)left_sample.size()}; // Size of left nparray's first dimension
    PyObject* left_sample_nparray = PyArray_SimpleNewFromData(1, left_nparray_shape, NPY_INT, &left_sample[0]); // Creating numpy array with 1 dimension, taking "dim" as a dummy, elements are integers, and the data is taken from "sample1" as a int* pointer
    npy_intp right_nparray_shape[] = {(npy_intp)right_sample.size()}; // Size of right nparray's first dimension
    PyObject* right_sample_nparray = PyArray_SimpleNewFromData(1, right_nparray_shape, NPY_INT, &right_sample[0]);

    PyObject* ks_2samp = PyObject_GetAttrString(scipy_stats_module, "ks_2samp");
    Py_DecRef(scipy_stats_module);

    PyObject* ks_2samp_return_val = PyObject_CallFunctionObjArgs(ks_2samp, left_sample_nparray, right_sample_nparray, NULL);
    Py_DecRef(ks_2samp);
    Py_DecRef(right_sample_nparray);
    Py_DecRef(left_sample_nparray);

    double p_value = PyFloat_AsDouble(PyTuple_GetItem(ks_2samp_return_val, 1));
    Py_DecRef(ks_2samp_return_val);

    PyGILState_Release(gstate); // Releasing GIL
    return p_value;
}


void initialize_c_2d_int_array(int*& c_array, unsigned long row_length_c_array, std::vector<int> &row1, std::vector<int> &row2) {
    for (unsigned int i = 0; i < row_length_c_array; i++) {
        c_array[i] = row1[i];
        c_array[row_length_c_array + i] = row2[i];
    }
}
double exec_2(std::vector<int> &left_sample, std::vector<int> &right_sample){
    PyGILState_STATE gstate = PyGILState_Ensure(); // Acquiring GIL for thread-safe usage Python C API

    PyObject* scipy_stats_module = PyImport_ImportModule("scipy.stats"); // importing "scipy.stats" module
                                                                         //            import_array();
    unsigned long n_cols = std::min(left_sample.size(), right_sample.size());
    int* both_samples = (int*) (malloc(2 * n_cols * sizeof(int)));
    initialize_c_2d_int_array(both_samples, n_cols, left_sample, right_sample);
    npy_intp dim3[] = {2, (npy_intp) n_cols};
    PyObject* both_samples_nparray = PyArray_SimpleNewFromData(2, dim3, NPY_INT, both_samples);

    PyObject* anderson_ksamp = PyObject_GetAttrString(scipy_stats_module, "anderson_ksamp");
    Py_DecRef(scipy_stats_module);

    PyObject* anderson_2samp_return_val = PyObject_CallFunctionObjArgs(anderson_ksamp, both_samples_nparray, NULL);
    Py_DecRef(anderson_ksamp);
    Py_DecRef(both_samples_nparray);
    free(both_samples);

    double p_value = PyFloat_AsDouble(PyTuple_GetItem(anderson_2samp_return_val, 2));
    Py_DecRef(anderson_2samp_return_val);

    PyGILState_Release(gstate); // Releasing GIL

    return p_value;
}


void exec(int thread_id) {
    std::vector<int> left_sample;
    std::vector<int> right_sample;

    int n = 50;
    for (int j = 0; j < n; j++) {

        int size = 100;
        for (int i = 0; i < size; i++) {
            left_sample.push_back(limited_rand(0, 100));
            right_sample.push_back(limited_rand(0, 100));
        }

        exec_1(left_sample, right_sample);
        exec_2(left_sample, right_sample);
    }
}

我使用Python C API的函数只有exec_1exec_2,而exec只需在新的随机数据上重复调用。这是我能想到的最简单的代码,它模仿了我真实代码的行为。为了更好的可读性,我还省略了使用pythonapi时的每种类型的错误检查。在

如果没有其他选择,我将运行我的代码,比如选项2选项3并忘记错误,但我真的很想了解发生了什么。你能帮助我吗?在

另外,我在MacOS10.12.5系统下使用Xcode8.3.3运行Python3.6.1。如果你需要更多的细节请告诉我。在


Tags: samplepyrightforreturnstatsscipyarray
1条回答
网友
1楼 · 发布于 2024-06-28 20:05:00

选项1

我想是因为您调用了PyEval_SaveThread()(它释放了gil,返回了一个保存的线程状态,并将当前线程状态设置为NULL),所以给了您一个分段错误。在

Py嫒Finalize将尝试释放与解释器相关的所有内存,我想这包括主线程状态。因此,您可以通过以下方式捕获此状态:

 PyEval_InitThreads(); //initialize and aquire the GIL
 //release the GIL, store thread state, set the current thread state to NULL
 PyThreadState *mainThreadState = PyEval_SaveThread();

 *main code segment*

 //re-aquire the GIL (re-initialize the current thread state)
 PyEval_RestoreThread(mainThreadState); 

 Py_Finalize();
 return 0;

或者,您可以在调用PyEval_InitThreads()后立即调用PyEval_ReleaseLock(),因为主代码段似乎不使用任何嵌入的python。我有一个类似的问题,似乎解决了。在

注意:其他线程仍需要在必要时获取/释放GIL

相关问题 更多 >