获取运行时错误:无法使用多输入Keras模型创建链接(名称已存在)

2024-09-29 18:35:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我无法保存Keras模型,因为我遇到了标题中提到的错误。我一直在使用tensorflow gpu。我的模型由4个输入组成,每个输入是一个ResNet50。当我只使用一个输入时,下面的回调工作得很好,但是使用多个输入时,我得到了以下错误:

RuntimeError: Unable to create link (name already exists)

callbacks = [EarlyStopping(monitor='val_loss', patience=30,mode='min', min_delta=0.0001, verbose=1),
    ModelCheckpoint(checkpoint_path, monitor='val_loss',save_best_only=True, mode='min', verbose=1)
]

现在,如果没有回调,我无法在培训结束时保存模型,因为我遇到了相同的错误,但我能够使用以下代码found here修复该错误:

from tensorflow.python.keras import backend as K

with K.name_scope(model.optimizer.__class__.__name__):
    for i, var in enumerate(model.optimizer.weights):
        name = 'variable{}'.format(i)
        model.optimizer.weights[i] = tf.Variable(var, name=name)

此代码仅适用于单输入模型,并放在训练函数model.fit之后

对于回调,即使上面的代码也不起作用。这篇文章与我的previous one有某种关联

我已经读到这个问题可以用tf-nightly来解决,所以我尝试了,但没有成功

我已经用a standalone code and generated data in a Google colab and it worked测试过了。所以我检查了tf版本,它和我的一样2.3.0。至于cuda,colab和我的机器都在运行:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

这可能是问题所在吗

更新:

以下是输出错误:

113/113 [==============================] - ETA: 0s - loss: 30.0107 - mae: 1.3525
Epoch 00001: val_loss improved from inf to 0.18677, saving model to saved_models/multi_channel_model.h5
Traceback (most recent call last):
  File "fine_tuning.py", line 111, in <module>
    run()
  File "fine_tuning.py", line 104, in run
    model.fit(x=train_x_list, y=train_y, validation_split=0.2,
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1249, in on_epoch_end
    self._save_model(epoch=epoch, logs=logs)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1301, in _save_model
    self.model.save(filepath, overwrite=True, options=self._options)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1978, in save
    save.save_model(self, filepath, overwrite, include_optimizer, save_format,
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 130, in save_model
    hdf5_format.save_model_to_hdf5(
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 125, in save_model_to_hdf5
    save_optimizer_weights_to_hdf5_group(f, model.optimizer)
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 593, in save_optimizer_weights_to_hdf5_group
    param_dset = weights_group.create_dataset(
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 139, in create_dataset
    self[name] = dset
  File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 373, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)

Tags: nameinpyhomemodelsavelibpackages
2条回答

当使用几个预先训练好的模型实例并通过修改受保护的属性将它们保存到h5时,我能够解决重复变量名导致RuntimeError的问题。一般不建议这样做,但在我的情况下,我现在需要一个解决方案,而不是将来的校对。我正在{}上与{}合作

在创建组合模型my_model之后,我在编译之前放置了以下内容。培训和保存检查点的工作如期进行

编辑:注意,在我的例子中,在加载组合模型的h5文件时,如果要再次保存,则必须执行相同的步骤

    for i, w in enumerate(my_model.weights):
        split_name = w.name.split('/')
        new_name = split_name[0] + '_' + str(i) + '/' + split_name[1] + '_' + str(i)
        my_model.weights[i]._handle_name = new_name

在我的案例中,修改组合模型的optimizer.weights,正如您提到的建议那样,没有任何帮助。我还选择用load_model(compile=False)加载预先训练好的模型,以删除它们的优化器权重

Here是我发现的关于这一点的另一个讨论,在评论中有一个类似的“解决方案”

  1. 试试CUDA 10.1。 https://www.tensorflow.org/install/gpu表示“TensorFlow支持CUDA®10.1”

  2. ModelCheckpoint回调有问题。检查检查点路径位置是否可写?此外,该参考说明“如果save_best_only=True,则根据监控数量的最新最佳模型不会被覆盖。”因此,您可能希望在每次运行模型时删除最后一个保存模型或在检查点路径中提供新的唯一名称。它很可能会阻止覆盖以前的模型并引发错误

相关问题 更多 >

    热门问题