为什么tensorflow模块占用所有GPU内存?

2024-06-23 20:02:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在训练TensorFlow 2上的U-net。当我加载这个模型时,它几乎占用了GPU的所有内存(26GB中的22GB),尽管我的模型最多应该占用1.5GB的内存和1.9亿个参数。为了理解这个问题,我尝试加载一个没有任何层的模型,但令我惊讶的是,它仍然占用了相同的内存量。我的型号代码附在下面:

x = tf.keras.layers.Input(shape=(256,256,1))

model = Sequential(
    [
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Activation('relu')(Add()([conv5_0, conv5_2])),
        MaxPooling2D(pool_size=(2, 2)),

        Conv2D(2048, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(2048, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(2048, 3, padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(1024, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        UpSampling2D(size = (2,2)),
        Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'), 
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),
        Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal'),

        Conv2D(1, 3, activation = 'linear', padding = 'same', kernel_initializer = 'he_normal')
    ])

y = model(x)

我注释掉了所有的层,它仍然占用了22GB。我正在使用jupyter笔记本运行代码。我原以为在我的jupyter笔记本的开头添加tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=x)可以解决问题,但事实并非如此。我的目标是在GPU上同时运行多个脚本,以便更有效地利用我的时间。任何帮助都将不胜感激。多谢各位

注意:我们注意到,这不仅发生在这段代码中,其他任何Tensorflow模块也会发生。例如,在我的代码的某个地方,我在加载模型之前使用了tf.signal.ifft2,它占用的内存与模型几乎相同。如何解决这个问题


Tags: 内存模型sizeactivationkernelherelusame
3条回答

进一步的讨论可以在https://www.tensorflow.org/guide/gpu找到,您应该阅读它

您可以这样动态分配内存:

from keras.backend.tensorflow_backend import set_session

config=tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
set_session(sess)

您需要限制GPU内存的增长,您可以在TensorFlow page上找到一个示例代码

我还复制了代码片段:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
    f.config.experimental.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

我在我的一些项目中也遇到过同样的问题,我注意到如果批量较大,那么GPU内存就会出现问题。尝试将批处理大小设置为尽可能小。当模型比较复杂时,我从批量大小1开始

相关问题 更多 >

    热门问题