我无法保存Keras模型,因为我遇到了标题中提到的错误。我一直在使用tensorflow gpu。我的模型由4个输入组成,每个输入是一个ResNet50。当我只使用一个输入时,下面的回调工作得很好,但是使用多个输入时,我得到了以下错误:
RuntimeError: Unable to create link (name already exists)
callbacks = [EarlyStopping(monitor='val_loss', patience=30,mode='min', min_delta=0.0001, verbose=1),
ModelCheckpoint(checkpoint_path, monitor='val_loss',save_best_only=True, mode='min', verbose=1)
]
现在,如果没有回调,我无法在培训结束时保存模型,因为我遇到了相同的错误,但我能够使用以下代码found here修复该错误:
from tensorflow.python.keras import backend as K
with K.name_scope(model.optimizer.__class__.__name__):
for i, var in enumerate(model.optimizer.weights):
name = 'variable{}'.format(i)
model.optimizer.weights[i] = tf.Variable(var, name=name)
此代码仅适用于单输入模型,并放在训练函数model.fit
之后
对于回调,即使上面的代码也不起作用。这篇文章与我的previous one有某种关联
我已经读到这个问题可以用tf-nightly
来解决,所以我尝试了,但没有成功
我已经用a standalone code and generated data in a Google colab and it worked测试过了。所以我检查了tf版本,它和我的一样2.3.0
。至于cuda,colab和我的机器都在运行:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
这可能是问题所在吗
更新:
以下是输出错误:
113/113 [==============================] - ETA: 0s - loss: 30.0107 - mae: 1.3525
Epoch 00001: val_loss improved from inf to 0.18677, saving model to saved_models/multi_channel_model.h5
Traceback (most recent call last):
File "fine_tuning.py", line 111, in <module>
run()
File "fine_tuning.py", line 104, in run
model.fit(x=train_x_list, y=train_y, validation_split=0.2,
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1249, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1301, in _save_model
self.model.save(filepath, overwrite=True, options=self._options)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1978, in save
save.save_model(self, filepath, overwrite, include_optimizer, save_format,
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 130, in save_model
hdf5_format.save_model_to_hdf5(
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 125, in save_model_to_hdf5
save_optimizer_weights_to_hdf5_group(f, model.optimizer)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 593, in save_optimizer_weights_to_hdf5_group
param_dset = weights_group.create_dataset(
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 139, in create_dataset
self[name] = dset
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 373, in __setitem__
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)
当使用几个预先训练好的模型实例并通过修改受保护的属性将它们保存到}上与{}合作
h5
时,我能够解决重复变量名导致RuntimeError
的问题。一般不建议这样做,但在我的情况下,我现在需要一个解决方案,而不是将来的校对。我正在{在创建组合模型
my_model
之后,我在编译之前放置了以下内容。培训和保存检查点的工作如期进行编辑:注意,在我的例子中,在加载组合模型的
h5
文件时,如果要再次保存,则必须执行相同的步骤在我的案例中,修改组合模型的
optimizer.weights
,正如您提到的建议那样,没有任何帮助。我还选择用load_model(compile=False)
加载预先训练好的模型,以删除它们的优化器权重Here是我发现的关于这一点的另一个讨论,在评论中有一个类似的“解决方案”
试试CUDA 10.1。 https://www.tensorflow.org/install/gpu表示“TensorFlow支持CUDA®10.1”
ModelCheckpoint
回调有问题。检查检查点路径位置是否可写?此外,该参考说明“如果save_best_only=True,则根据监控数量的最新最佳模型不会被覆盖。”因此,您可能希望在每次运行模型时删除最后一个保存模型或在检查点路径中提供新的唯一名称。它很可能会阻止覆盖以前的模型并引发错误相关问题 更多 >
编程相关推荐