Tensorflow:CUDA错误\u内存不足tensorflow.python.framework.错误_内部错误:Dst张量未初始化

2024-10-03 02:37:22 发布

您现在位置:Python中文网/ 问答频道 /正文

当我用TensorFlow用GPU训练VGG16神经网络时,它总是显示CUDA_ERROR_OUT_OF_MEMORY,并总是以错误tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.停止

我用这些信息在网上搜索,得到了一些提示:

  • config.gpu_options.allow_growth设置为True。在
  • config.gpu_options.per_process_gpu_memory_fraction设置为更小的分数,如0.6。在
  • 设置较小的batch size。在

但是这些技巧不起作用,整个过程就像什么都没改变一样。

这是我的硬件:

  • GPU:NVIDIA GTX 1060
  • 内存:3G+4G(共享内存)

我使用nvidia-smi监视GPU的使用,下面是详细信息。

运行前:

Thu Apr 19 14:21:59 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31                 Driver Version: 388.31                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060   WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P8     7W /  N/A |    587MiB /  3072MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7300    C+G   ...osoft Office\root\Office16\POWERPNT.EXE N/A      |
|    0      8244    C+G   ...6)\Youdao\YoudaoNote\YNoteCefRender.exe N/A      |
|    0      9988    C+G   C:\Windows\explorer.exe                    N/A      |
|    0     10696    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0     10808    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0     11024    C+G   Insufficient Permissions                   N/A      |
|    0     11092    C+G   C:\Windows\System32\mstsc.exe              N/A      |
|    0     13076    C+G   ...ogram Files (x86)\Skype\Phone\Skype.exe N/A      |
|    0     14664    C+G   ...osoft Office\root\Office16\POWERPNT.EXE N/A      |
+-----------------------------------------------------------------------------+

进程开始:

^{pr2}$

10步后:

Thu Apr 19 14:30:40 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31                 Driver Version: 388.31                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060   WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   64C    P2    31W /  N/A |   2595MiB /  3072MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7300    C+G   ...osoft Office\root\Office16\POWERPNT.EXE N/A      |
|    0      9988    C+G   C:\Windows\explorer.exe                    N/A      |
|    0     10696    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0     10808    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0     11024    C+G   Insufficient Permissions                   N/A      |
|    0     11092    C+G   C:\Windows\System32\mstsc.exe              N/A      |
|    0     13076    C+G   ...ogram Files (x86)\Skype\Phone\Skype.exe N/A      |
|    0     14404      C   ...ools\Anaconda3\envs\py36_tfg\python.exe N/A      |
|    0     14664    C+G   ...osoft Office\root\Office16\POWERPNT.EXE N/A      |
+-----------------------------------------------------------------------------+

60步后: 显示了一些消息,但仍可以运行

2018-04-19 14:33:56.384528: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
2018-04-19 14:33:56.423080: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
2018-04-19 14:33:56.474281: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY

Thu Apr 19 14:36:13 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31                 Driver Version: 388.31                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060   WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   63C    P2    33W /  N/A |   2602MiB /  3072MiB |     43%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7300    C+G   ...osoft Office\root\Office16\POWERPNT.EXE N/A      |
|    0      9988    C+G   C:\Windows\explorer.exe                    N/A      |
|    0     10696    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0     10808    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0     11024    C+G   Insufficient Permissions                   N/A      |
|    0     11092    C+G   C:\Windows\System32\mstsc.exe              N/A      |
|    0     13076    C+G   ...ogram Files (x86)\Skype\Phone\Skype.exe N/A      |
|    0     14404      C   ...ools\Anaconda3\envs\py36_tfg\python.exe N/A      |
|    0     14664    C+G   ...osoft Office\root\Office16\POWERPNT.EXE N/A      |
+-----------------------------------------------------------------------------+

170步后:

显示了大约八百行的消息,然后进程因错误而停止

大约八百行:

2018-04-19 14:49:35.688274: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY

由于某些错误而停止:

Traceback (most recent call last):
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
    return fn(*args)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
    status, run_metadata)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\contextlib.py", line 88, in __exit__
    next(self.gen)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
         [[Node: input/input/div/_79 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_111_input/input/div", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "vgg16_train_and_test.py", line 212, in <module>
    train()
  File "vgg16_train_and_test.py", line 124, in train
    coord.join(threads)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\six.py", line 693, in reraise
    raise value
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 234, in _run
    sess.run(enqueue_op)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
    run_metadata_ptr)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
         [[Node: input/input/div/_79 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_111_input/input/div", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Tags: inpygpulibpackagestensorflowlinesite