重新加载后,Keras模型参数均为“NaN”

2024-09-28 03:20:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Resnet50的转移学习。我用Keras提供的预训练模型(imagenet)创建了一个新模型。在

训练完我的新模型后,我将其保存为:

# Save the Siamese Network architecture
siamese_model_json = siamese_network.to_json()
with open("saved_model/siamese_network_arch.json", "w") as json_file:
    json_file.write(siamese_model_json)
# save the Siamese Network model weights
siamese_network.save_weights('saved_model/siamese_model_weights.h5')

后来,我重新加载它如下,做一些预测:

^{pr2}$

然后我检查权重是否合理,如下所示(从其中一个层开始):

print("bn3d_branch2c:\n",
      siamese_network.get_layer('model_1').get_layer('bn3d_branch2c').get_weights())

如果我只训练我的网络1个时代,我会看到合理的价值。。在

但是,如果我训练我的模型18个时代(这需要5-6个小时,因为我有一台非常慢的计算机),我只看到NaN值如下:

bn3d_branch2c:
 [array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       ...

这有什么诀窍?在

附录1:

下面是我如何创建我的模型。在

这里,我有一个三重态亏损函数,我将在后面用到。在

def triplet_loss(inputs, dist='euclidean', margin='maxplus'):
    anchor, positive, negative = inputs
    positive_distance = K.square(anchor - positive)
    negative_distance = K.square(anchor - negative)
    if dist == 'euclidean':
        positive_distance = K.sqrt(K.sum(positive_distance, axis=-1, keepdims=True))
        negative_distance = K.sqrt(K.sum(negative_distance, axis=-1, keepdims=True))
    elif dist == 'sqeuclidean':
        positive_distance = K.sum(positive_distance, axis=-1, keepdims=True)
        negative_distance = K.sum(negative_distance, axis=-1, keepdims=True)
    loss = positive_distance - negative_distance
    if margin == 'maxplus':
        loss = K.maximum(0.0, 2 + loss)
    elif margin == 'softplus':
        loss = K.log(1 + K.exp(loss))

    returned_loss = K.mean(loss)
    return returned_loss

下面是我如何自始至终构建我的模型。我给出完整的代码来给出确切的图片。在

model = ResNet50(weights='imagenet')

# Remove the last layer (Needed to later be able to create the Siamese Network model)
model.layers.pop()

# First freeze all layers of ResNet50. Transfer Learning to be applied.
for layer in model.layers:
    layer.trainable = False

# All Batch Normalization layers still need to be trainable so that the "mean"
# and "standard deviation (std)" params can be updated with the new training data
model.get_layer('bn_conv1').trainable = True
model.get_layer('bn2a_branch2a').trainable = True
model.get_layer('bn2a_branch2b').trainable = True
model.get_layer('bn2a_branch2c').trainable = True
model.get_layer('bn2a_branch1').trainable = True
model.get_layer('bn2b_branch2a').trainable = True
model.get_layer('bn2b_branch2b').trainable = True
model.get_layer('bn2b_branch2c').trainable = True
model.get_layer('bn2c_branch2a').trainable = True
model.get_layer('bn2c_branch2b').trainable = True
model.get_layer('bn2c_branch2c').trainable = True
model.get_layer('bn3a_branch2a').trainable = True
model.get_layer('bn3a_branch2b').trainable = True
model.get_layer('bn3a_branch2c').trainable = True
model.get_layer('bn3a_branch1').trainable = True
model.get_layer('bn3b_branch2a').trainable = True
model.get_layer('bn3b_branch2b').trainable = True
model.get_layer('bn3b_branch2c').trainable = True
model.get_layer('bn3c_branch2a').trainable = True
model.get_layer('bn3c_branch2b').trainable = True
model.get_layer('bn3c_branch2c').trainable = True
model.get_layer('bn3d_branch2a').trainable = True
model.get_layer('bn3d_branch2b').trainable = True
model.get_layer('bn3d_branch2c').trainable = True
model.get_layer('bn4a_branch2a').trainable = True
model.get_layer('bn4a_branch2b').trainable = True
model.get_layer('bn4a_branch2c').trainable = True
model.get_layer('bn4a_branch1').trainable = True
model.get_layer('bn4b_branch2a').trainable = True
model.get_layer('bn4b_branch2b').trainable = True
model.get_layer('bn4b_branch2c').trainable = True
model.get_layer('bn4c_branch2a').trainable = True
model.get_layer('bn4c_branch2b').trainable = True
model.get_layer('bn4c_branch2c').trainable = True
model.get_layer('bn4d_branch2a').trainable = True
model.get_layer('bn4d_branch2b').trainable = True
model.get_layer('bn4d_branch2c').trainable = True
model.get_layer('bn4e_branch2a').trainable = True
model.get_layer('bn4e_branch2b').trainable = True
model.get_layer('bn4e_branch2c').trainable = True
model.get_layer('bn4f_branch2a').trainable = True
model.get_layer('bn4f_branch2b').trainable = True
model.get_layer('bn4f_branch2c').trainable = True
model.get_layer('bn5a_branch2a').trainable = True
model.get_layer('bn5a_branch2b').trainable = True
model.get_layer('bn5a_branch2c').trainable = True
model.get_layer('bn5a_branch1').trainable = True
model.get_layer('bn5b_branch2a').trainable = True
model.get_layer('bn5b_branch2b').trainable = True
model.get_layer('bn5b_branch2c').trainable = True
model.get_layer('bn5c_branch2a').trainable = True
model.get_layer('bn5c_branch2b').trainable = True
model.get_layer('bn5c_branch2c').trainable = True

# Used when compiling the siamese network
def identity_loss(y_true, y_pred):
    return K.mean(y_pred - 0 * y_true)  

# Create the siamese network

x = model.get_layer('flatten_1').output # layer 'flatten_1' is the last layer of the model
model_out = Dense(128, activation='relu',  name='model_out')(x)
model_out = Lambda(lambda  x: K.l2_normalize(x,axis=-1))(model_out)

new_model = Model(inputs=model.input, outputs=model_out)

anchor_input = Input(shape=(224, 224, 3), name='anchor_input')
pos_input = Input(shape=(224, 224, 3), name='pos_input')
neg_input = Input(shape=(224, 224, 3), name='neg_input')

encoding_anchor   = new_model(anchor_input)
encoding_pos      = new_model(pos_input)
encoding_neg      = new_model(neg_input)

loss = Lambda(triplet_loss)([encoding_anchor, encoding_pos, encoding_neg])

siamese_network = Model(inputs  = [anchor_input, pos_input, neg_input], 
                        outputs = loss) # Note that the output of the model is the 
                                        # return value from the triplet_loss function above

siamese_network.compile(optimizer=Adam(lr=.0001), loss=identity_loss)

需要注意的一点是,我使所有的批处理规范化层都是“可训练的”,这样与BN相关的参数可以用我的训练数据更新。这会产生很多行,但我找不到一个更短的解决方案。在


Tags: thelayertrueinputgetmodelnandistance
1条回答
网友
1楼 · 发布于 2024-09-28 03:20:08

这个解决方案的灵感来自于@gurmete Singh上面的建议。在

看起来,在训练过程中,可训练层的权重在一段时间后变得如此之大,所有这些权重都被设置为NaN,这让我觉得我是以错误的方式保存和重新加载模型,但问题是坡度爆炸。在

在Gita Hub中也可以看到类似的问题:github.com/keras-团队/keras/问题/2378 在github中,建议使用较低的学习率来避免这个问题。在

在这个链接(Keras ML library: how to do weight clipping after gradient updates? TensorFlow backend)中,讨论了两种解决方案: -在优化器中使用clipvalue参数,它只需根据配置剪切计算的渐变值。但这不是推荐的解决方案 -第二件事是使用clipnorm参数,当计算出的梯度值的L2范数超过用户给定的值时,它会简单地剪辑这些值。在

我还考虑过使用输入规范化(以避免梯度爆炸),但后来发现它已经在preprocess_input(..)函数中完成。 (查看此链接以获取详细信息:https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/preprocess_input)虽然可以将模式参数设置为“tf”(否则默认设置为“caffe”),这可能会有进一步的帮助(因为mode=“tf”设置将像素缩放到-1和1之间),但我没有尝试。在

总结一下,我在编译要培训的模型时改变了两件事:

已更改的行如下:

变更前:

siamese_network.compile(optimizer=Adam(**lr=.0001**), 
                        loss=identity_loss)

变更后:

^{pr2}$

1)使用较小的学习速率使梯度更新更小 2) 使用clipnorm参数规范化计算的渐变并剪切它们。在

我又训练了我的网络10个时代。但现在损失会慢慢减少。我在保存和存储模型时没有遇到任何问题。(至少在10个时代之后(在我的电脑上需要时间)

请注意,我将clipnorm的值设置为1。这意味着首先计算梯度的L2范数,如果计算的归一化梯度超过值“1”,则对渐变进行剪裁。我假设这是一个可以优化的超参数,它会影响训练模型所需的时间,同时有助于避免爆炸性梯度问题。在

相关问题 更多 >

    热门问题