KERA中批量标准化的输出是否取决于历元数?

2024-09-28 01:33:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Keras中找到了batchnormalization的输出。 我的模型是:

#导入库

import numpy as np
import keras
from keras import layers
from keras.layers import Input, Dense, Activation,  BatchNormalization, Flatten, Conv2D
from keras.models import Model

#模型

def HappyModel3(input_shape):


    X_input = Input(input_shape, name='input_layer')
    X = BatchNormalization(axis = 1, name = 'batchnorm_layer')(X_input)
    
    X = Dense(1, activation='sigmoid', name='sigmoid_layer')(X)
    
    
    model = Model(inputs = X_input, outputs = X, name='HappyModel3')
    
    return model

    

编译模型|此处纪元数为1

X_train=np.array([[1,1,-1],[2,1,1]])
Y_train=np.array([0,1])

happyModel_1=HappyModel3(X_train[0].shape)
happyModel_1.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_1.fit(x = X_train, y = Y_train, epochs = 1 , batch_size = 2, verbose=0 )

为epochs=1的模型查找批量归一化层的输出:

for i in range(0, len(happyModel_1.layers)):
    
    tmp_model = Model(happyModel_1.layers[0].input, happyModel_1.layers[i].output)
    tmp_output = tmp_model.predict(X_train)
    
    if i in (0,1) :
        print(happyModel_1.layers[i].name)
        print(tmp_output.shape)
        print(tmp_output)
        print('\n')

代码输出为:

input_layer
(2, 3)
[[ 1.  1. -1.]
 [ 2.  1.  1.]]


batchnorm_layer
(2, 3)
[[ 0.99003249  0.99388224 -0.99551398]
 [ 1.99647105  0.99388224  0.9971655 ]]

我们在轴=1处进行了归一化| 批次标准层输出:轴=1时,第一维度平均值为1.5,第二维度平均值为1,第三维度平均值为0。 由于它的批量标准,我希望所有3个维度的平均值都接近0

当我将历代数增加到1000时会发生这种情况:

happyModel_2=HappyModel3(X_train[0].shape)
happyModel_2.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_2.fit(x = X_train, y = Y_train, epochs = 1000 , batch_size = 2, verbose=0 )

查找epochs=1000的模型的批量归一化层输出:

for i in range(0, len(happyModel_2.layers)):
    tmp_model = Model(happyModel_2.layers[0].input, happyModel_2.layers[i].output)
    tmp_output = tmp_model.predict(X_train)
    
    if i in (0,1) :
        print(happyModel_2.layers[i].name)
        print(tmp_output.shape)
        print(tmp_output)
        print('\n')

#代码输出

input_layer
(2, 3)
[[ 1.  1. -1.]
 [ 2.  1.  1.]]


batchnorm_layer
(2, 3)
[[ -1.95576239e+00   8.08715820e-04  -1.86621261e+00]
 [  1.95795488e+00   8.08715820e-04   1.86590290e+00]]

我们在轴=1处进行了归一化,现在在轴=1处,批处理规范层输出为:第一维平均值为0,第二维平均值为0,第三维平均值为0。这是现在的预期输出

我的问题是:Keras中批量规范化的输出是否依赖于历元数? (可能是的,正如我们进行反向传播一样,批处理规范化参数将受到不断增加的历元数的影响)


Tags: name模型importlayerinputoutputmodellayers
1条回答
网友
1楼 · 发布于 2024-09-28 01:33:37

关于^{}的keras文档回答了您的问题:

Importantly, batch normalization works differently during training and during inference.

培训期间,即调用model.fit()时会发生什么

During training [...], the layer normalizes its output using the mean and standard deviation of the current batch of inputs.

但是,在推理过程中会发生什么,例如在您的示例中调用mode.predict()

During inference [...], the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns (batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta.

self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode [...].

重要的是要理解,批次标准化将通过查看单个批次的统计信息,并通过从单个批次统计信息中计算的运行平均值,在内部更新moving_meanmoving_variance参数,从而在培训期间计算整个培训数据的统计信息(均值和方差)。因此,它们不受反向传播的影响。理想情况下,在您的模型看到了足够多的培训示例(或进行了足够多的培训)之后,moving_meanmoving_variance将对应于整个培训集的统计信息。然后在推理过程中使用这两个参数来规范化测试示例。训练开始时,这两个参数将初始化为0和1。进一步的批处理规范还有两个参数,称为gamma和beta,它们将由优化器更新,因此取决于您的损失

本质上,yes,推理过程中批处理规范化的输出取决于训练模型的历元数。首先,由于均值和方差的移动平均值发生变化,其次是由于学习到的参数gamma和beta

为了更深入地理解批处理规范化是如何工作的以及为什么需要它,请看一下original publication

相关问题 更多 >

    热门问题