使用TensorFlowGPU+Python多处理时出现错误？ - 问答 - Python中文网

使用TensorFlowGPU+Python多处理时出现错误？

2024-09-30 01:30:20 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我注意到在使用tensorflowgpu+Python多处理时有一个奇怪的行为。在

我已经用一些定制和我自己的数据集实现了一个DCGAN。因为我在调整DCGAN的某些特性，所以我有训练数据和测试数据用于评估。在

由于数据集的大小，我编写了并发运行的数据加载器，并使用Python的multiprocessing将其预加载到队列中。在

代码结构大致如下：

class ConcurrentLoader:
    def __init__(self, dataset):
        ...

class DCGAN
     ...

net = DCGAN()
training_data = ConcurrentLoader(path_to_training_data)
test_data = ConcurrentLoader(path_to_test_data)

此代码在TensorFlow CPU上运行良好，使用CUDA 8.0的TensorFlow GPU<；=1.3.0，但当我使用TensorFlow GPU 1.4.1和CUDA 9运行完全相同的代码时（截至2017年12月，TF&CUDA的最新版本）会崩溃：

^{pr2}$

真正让我困惑的是，如果我只删除test_data，则不会发生错误。因此，由于某些奇怪的原因，TensorFlow GPU 1.4.1和CUDA 9只使用一个ConcurrentLoader，但在多个加载程序初始化时崩溃。在

更有趣的是（在异常之后）我不得不手动关闭python进程，因为GPU的VRAM、系统的RAM甚至python进程在脚本崩溃后仍然保持活动状态。在

此外，它必须与Python的multiprocessing模块有一些奇怪的连接，因为当我在Keras中实现相同的模型时（使用TF backend！）代码也运行得很好，有两个并发加载程序。我想Keras是在某种程度上创建了一个抽象层来防止TF崩溃。在

我在哪里可能搞砸了multiprocessing模块，它会导致这样的崩溃？在

以下是在ConcurrentLoader内使用multiprocessing的代码部分：

def __init__(self, dataset):
    ...
    self._q = mp.Queue(64)
    self._file_cycler = cycle(img_files)
    self._worker = mp.Process(target=self._worker_func, daemon=True)
    self._worker.start()

def _worker_func(self):
    while True:
        ... # gets next filepaths from self._file_cycler
        buffer = list()
        for im_path in paths:
            ... # uses OpenCV to load each image & puts it into the buffer
        self._q.put(np.array(buffer).astype(np.float32))

……就这样。在

我在哪里写过“不稳定”或“非pythonic”multiprocessing代码？我认为daemon=True应该确保主进程一死，每个进程都会被杀死？不幸的是，对于这个特定的错误，情况并非如此。在

我是不是误用了默认的multiprocessing.Process或multiprocessing.Queue？我认为只要编写一个类，在队列中存储成批的图像，并通过方法/实例变量访问它，就可以了。在

Tags： to 数据 path 代码 self data gpu 进程

1条回答

网友
1楼 · 发布于 2024-09-30 01:30:20

在尝试使用tensorflow和多处理时，我也会遇到同样的错误
E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
但在不同的环境中tf1.4+cuda8.0+cudnn6.0。示例代码中的matrixMulCUBLAS工作正常。我也想知道正确的解决办法！引用failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED on a AWS p2.xlarge instance对我不起作用。在

相关问题更多 >

编程相关推荐

热门问题

热门文章