nvidiadocker中的TensorFlow:调用cuInit失败：CUDA_ERROR_UNKNOWN

2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN 2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1 2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1 2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0 2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) """ 2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0 2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0

3条回答

网友

1楼 · 编辑于 2024-09-19 15:56:27

可能问题与由GPU创建的JIT缓存文件权限有关。在linux上，默认情况下，缓存文件是在~/.nv/ComputeCache创建的。为JIT cache设置另一个目录可以解决问题。就这么做

export CUDA_CACHE_PATH=/tmp/nvidia

在GPU上运行之前。

网友

2楼 · 编辑于 2024-09-19 15:56:27

我试着安装nvidia modrpobe，但仍然是同样的错误。然后一个简单的系统重启就对我起作用了

网友

3楼 · 编辑于 2024-09-19 15:56:27

我在ubuntu16.04桌面上运行tensorflow。

我以前用GPU运行代码很好。但今天我找不到具有以下代码的gpu设备

import tensorflow as tf from tensorflow.python.client import device_lib as _device_lib with tf.Session() as sess: local_device_protos = _device_lib.list_local_devices() print(local_device_protos) [print(x.name) for x in local_device_protos]

当我运行tf.Session()时，我意识到了下面的问题

cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN

我在系统详细信息中检查我的Nvidia驱动程序，然后nvcc -V，nvida-smi检查驱动程序、cuda和cudnn。一切似乎都很好。

然后我去其他驱动程序检查驱动程序的详细信息，在那里我发现有许多版本的NVIDIA驱动程序和最新版本的选择。但当我第一次安装驱动程序时，只有一个。

所以我选择一个旧版本，并应用更改

然后我运行tf.Session()问题也在这里。我想我应该重新启动我的电脑，在我重新启动之后，这个问题就消失了。

sess = tf.Session() 2018-07-01 12:02:41.336648: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-07-01 12:02:41.464166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-07-01 12:02:41.464482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.8225 pciBusID: 0000:01:00.0 totalMemory: 7.93GiB freeMemory: 7.27GiB 2018-07-01 12:02:41.464494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-07-01 12:02:42.308689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-07-01 12:02:42.308721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-07-01 12:02:42.308729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-07-01 12:02:42.309686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7022 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability:

相关问题更多 >

编程相关推荐

热门问题

热门文章