<p>这个答案不适用于这个问题,但这是在Google搜索结果的顶端<code>keras "KeyError: 'val_loss'"</code>,所以我将分享我的问题的解决方案。在</p>
<p>错误对我来说也是一样的:当在检查点文件名中使用<code>val_loss</code>时,我会得到以下错误:<code>KeyError: 'val_loss'</code>。我的检查点也在监视这个字段,所以即使我从文件名中去掉了这个字段,我仍然会从检查点得到这个警告:<code>WARNING:tensorflow:Can save best model only with val_loss available, skipping.</code></p>
<p>在我的例子中,问题是我从分别使用Keras和Tensorflow 1升级到使用tensorflow2附带的Keras。<code>ModelCheckpoint</code>的<code>period</code>参数已替换为<code>save_freq</code>。我错误地假设<code>save_freq</code>的行为方式相同,所以我将其设置为<code>save_freq=1</code>,认为这样可以保存每一部史诗。但是,<a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint?version=stable" rel="nofollow noreferrer">docs</a>状态:</p>
<blockquote>
<p>save_freq: 'epoch' or integer. When using 'epoch', the callback saves the model after each epoch. When using integer, the callback saves the model at end of a batch at which this many samples have been seen since last saving. Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (it could reflect as little as 1 batch, since the metrics get reset every epoch). Defaults to 'epoch'</p>
</blockquote>
<p>设置<code>save_freq='epoch'</code>为我解决了这个问题。<strong>注意:OP仍然在使用<code>period=1</code>,所以这绝对不是导致他们问题的原因</strong></p>