Keras CUDA错误\u内存不足\u小数据

2024-06-23 03:43:15 发布

您现在位置:Python中文网/ 问答频道 /正文

模型:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_6 (LSTM)                (900, 30)                 4560      
_________________________________________________________________
dense_6 (Dense)              (900, 8)                  248       
=================================================================

培训代码:

for epoch in epochs:
    print('epoch: ', epoch)

    start_time_day = time.time()

    for d in days : 
        X,y = split_sequence(features, labels, n_steps)
        X = X.reshape(X.shape[0], X.shape[1], inputs_n)
        history = model.train_on_batch(X, y)

X形状是float32的(900, 11250, 7),大约是280Mb

我在GCP VM和K80(11gbram)上尝试了这个方法,我得到了CUDA内存不足的错误(它只是一次又一次地循环这个错误):

name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.562
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-05-12 15:40:43.694393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-12 15:40:45.602743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-12 15:40:45.602828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-12 15:40:45.602839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-12 15:40:45.603245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10754 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2019-05-12 15:40:56.928097: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
2019-05-12 15:41:05.702532: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-05-12 15:41:05.702600: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 4294967296

…

Tags: corehostforgputimedevicetensorflowcommon

热门问题