无法在tensorflowgpu上使用GPU:“无法创建cudnn句柄:cudnn\U状态\U内部\U错误”

2024-09-29 01:33:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我的问题总结

当我用tensorflow gpu执行代码时,标题中出现了一个错误。这个错误发生在每个包含卷积层的代码中。在

环境

  • Ubuntu 18.04版
  • Python 3.7.1
  • tensorflow gpu 1.13.1
  • CUDA 10.1
  • 铜管7.4.2

GPU周围的详细信息

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   46C    P8    21W / 215W |    568MiB /  7949MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1733      G   /usr/lib/xorg/Xorg                            18MiB |
|    0      1771      G   /usr/bin/gnome-shell                          57MiB |
|    0      2698      G   /usr/lib/xorg/Xorg                           175MiB |
|    0      2813      G   /usr/bin/gnome-shell                         168MiB |
|    0      3339      G   ...uest-channel-token=11703333986562712743    76MiB |
|    0      8579      G   /proc/self/exe                                67MiB |
+-----------------------------------------------------------------------------+

^{pr2}$

整个错误信息

2019-06-29 23:13:22.132275: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-06-29 23:13:22.803064: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-06-29 23:13:22.805965: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "train.py", line 90, in <module>
    main(args)
  File "train.py", line 81, in main
    callbacks=[callback]
  File "/home/yudai/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1426, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/yudai/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_generator.py", line 191, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/home/yudai/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1191, in train_on_batch
    outputs = self._fit_function(ins)  # pylint: disable=not-callable
  File "/home/yudai/.local/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
    run_metadata=self.run_metadata)
  File "/home/yudai/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/yudai/.local/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node block1_conv1/Conv2D}}]]
     [[{{node loss/arc_face_loss/broadcast_weights/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat}}]]

上面写着“无法创建cudnn句柄:cudnn状态\U内部错误”,所以我估计是由cudnn引起的。我尝试了一些方法,比如this question中的sudo rm -rf ~/.nv/,以及this GitHub issue中的{},但我无法解决。在

请告诉我这个问题的解决办法。在

谢谢。在


Tags: inpyselfhomegpulibpackagesusr