CNN上的爆炸梯度问题

2024-05-03 18:16:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试训练一个卷积神经网络来预测音频数据,它的输入是一个json,之前已经以正确的方式进行了预处理(这里没有错误)。但是,在训练模型时,会获得NaN值,从而完全停止训练

我正在使用Tensorflow 2.x开发GoogleColab。我试图降低学习率,但给出了相同的nan值


型号:

SAVED_MODEL_PATH = 'model.h5'
LEARNING_RATE = 0.0001
EPOCHS = 40
BATCH_SIZE = 32
NUM_KEYWORDS = 10
PATIENCE = 5

def build_model(input_shape, learning_rate=0.0001):
    '''build CNN network: 3 layers, flatten the output, softmax classifier & model compilation'''
    # LAYER 1
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu',
                                  input_shape=input_shape, kernel_regularizer=keras.regularizers.l2(l=0.001)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2), padding='same'))

    # LAYER 2
    model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu',
                                  kernel_regularizer=keras.regularizers.l2(l=0.001)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2), padding='same'))

    # LAYER 3
    model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(2, 2), activation='relu',
                                  kernel_regularizer=keras.regularizers.l2(l=0.001)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2), padding='same'))
    
    # FLATTEN INTO A DENSE LAYER
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(units=64, activation='relu'))
    model.add(tf.keras.layers.Dropout(rate=0.3))

    # SOFTMAX
    model.add(tf.keras.layers.Dense(units=NUM_KEYWORDS, activation='softmax'))

    # COMPILE
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])
    model.summary()
    return model

培训:

def train_model(model, epochs, batch_size, X_train, Y_train, X_validation, Y_validation):
    earlystop_callback = tf.keras.callbacks.EarlyStopping(monitor="accuracy", min_delta=0.001, patience=5)
    history = model.fit(X_train,
                        Y_train,
                        epochs=EPOCHS,
                        batch_size=BATCH_SIZE,
                        validation_data=(X_validation, Y_validation),
                        callbacks=[earlystop_callback])
      
    return history

输出:

Epoch 1/40
1166/1166 [==============================] - 5s 4ms/step - loss: nan - accuracy: 0.0251 - val_loss: nan - val_accuracy: 0.0264
Epoch 2/40
1166/1166 [==============================] - 4s 4ms/step - loss: nan - accuracy: 0.0251 - val_loss: nan - val_accuracy: 0.0264
Epoch 3/40
1166/1166 [==============================] - 4s 4ms/step - loss: nan - accuracy: 0.0251 - val_loss: nan - val_accuracy: 0.0264
Epoch 4/40
1166/1166 [==============================] - 4s 4ms/step - loss: nan - accuracy: 0.0251 - val_loss: nan - val_accuracy: 0.0264
Epoch 5/40
1166/1166 [==============================] - 4s 4ms/step - loss: nan - accuracy: 0.0251 - val_loss: nan - val_accuracy: 0.0264
Epoch 6/40
1166/1166 [==============================] - 4s 4ms/step - loss: nan - accuracy: 0.0251 - val_loss: nan - val_accuracy: 0.0264
365/365 [==============================] - 1s 2ms/step - loss: nan - accuracy: 0.0261

Loss: nan
Accuracy: 0.026089942082762718

Tags: addsizemodellayerstfsteptrainval