GridSearchCV和Keras提供的MSE与手动搜索不同

2024-05-20 16:05:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我用Keras和Sklearn用一个简单的神经网络做了一些实验,我遇到了一些意想不到的结果

在我的第一个实验中,NN有一个包含64个神经元的隐藏层,我使用StratifiedKFold类运行一个包含5个拆分的KFold

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy as np
import tensorflow as tf
import random

seed = 7
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

for train, test in kfold.split(X, Y):
    model = Sequential()
    model.add(Dense(64, input_dim=12, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
    model.summary()
    model.fit(X_train[train], Y[train], epochs=10,verbose=1)
    y_pred=model.predict(X_train[test])
    mse_value, mae_value=model.evaluate(X_train[test], Y[test], verbose=1)
    print(mse_value)

在第一次折叠中,我打印了以下信息:

Model: "sequential_169"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_342 (Dense)            (None, 64)                832       
_________________________________________________________________
dense_343 (Dense)            (None, 1)                 65        
=================================================================
Total params: 897
Trainable params: 897
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
163/163 [==============================] - 0s 520us/step - loss: 23.5748 - mae: 4.6046
Epoch 2/10
163/163 [==============================] - 0s 503us/step - loss: 1.9301 - mae: 1.0770
Epoch 3/10
163/163 [==============================] - 0s 502us/step - loss: 1.0503 - mae: 0.8026
Epoch 4/10
163/163 [==============================] - 0s 492us/step - loss: 0.7895 - mae: 0.6828
Epoch 5/10
163/163 [==============================] - 0s 503us/step - loss: 0.6499 - mae: 0.6171
Epoch 6/10
163/163 [==============================] - 0s 524us/step - loss: 0.5652 - mae: 0.5795
Epoch 7/10
163/163 [==============================] - 0s 506us/step - loss: 0.5806 - mae: 0.5819
Epoch 8/10
163/163 [==============================] - 0s 506us/step - loss: 0.4949 - mae: 0.5497
Epoch 9/10
163/163 [==============================] - 0s 493us/step - loss: 0.4864 - mae: 0.5418
Epoch 10/10
163/163 [==============================] - 0s 492us/step - loss: 0.4942 - mae: 0.5455
41/41 [==============================] - 0s 474us/step - loss: 0.4861 - mae: 0.5457
0.48606643080711365
...

请注意,训练时损失从23.5748增加到0.4942

在第二个实验中,我使用GridSearchCV类对要使用的层数执行网格搜索。(为了说明我的问题,我只试了一层)。我还将与前一个实验中相同的kfold策略传递给GridSearchCV的构造函数

from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import GridSearchCV
import tensorflow as tf
import random

seed = 7
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

def create_model(hidden_layers=1):
    # Initialize the constructor
    model = Sequential()

    # Add hidden layers
    for i in range(hidden_layers):
        if i == 0:
            model.add(Dense(64, input_dim=12, activation='relu'))
        else:
            model.add(Dense(64, activation='relu'))

    # Add an output layer 
    model.add(Dense(1))
        
    model.compile(optimizer='rmsprop', loss='mse', metrics=["mae"])
    model.summary()
        
    return model

model = KerasRegressor(build_fn=create_model, epochs=10, verbose=1)

param_grid = dict(hidden_layers=[1])

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

grid = GridSearchCV(estimator=model, param_grid=param_grid,
                    scoring=["neg_mean_absolute_error", "neg_mean_squared_error", "r2"],
                    refit="r2",
                    n_jobs=1, cv=kfold)

grid_result = grid.fit(X, Y)

使用此方法,在第一次折叠时,我得到以下输出:

Model: "sequential_180"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_364 (Dense)            (None, 64)                832       
_________________________________________________________________
dense_365 (Dense)            (None, 1)                 65        
=================================================================
Total params: 897
Trainable params: 897
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
163/163 [==============================] - 0s 527us/step - loss: 9.8205 - mae: 2.3366
Epoch 2/10
163/163 [==============================] - 0s 479us/step - loss: 1.0685 - mae: 0.8089
Epoch 3/10
163/163 [==============================] - 0s 503us/step - loss: 0.9351 - mae: 0.7488
Epoch 4/10
163/163 [==============================] - 0s 503us/step - loss: 0.9602 - mae: 0.7560
Epoch 5/10
163/163 [==============================] - 0s 502us/step - loss: 1.0195 - mae: 0.7830
Epoch 6/10
163/163 [==============================] - 0s 494us/step - loss: 0.9774 - mae: 0.7761
Epoch 7/10
163/163 [==============================] - 0s 489us/step - loss: 0.9569 - mae: 0.7413
Epoch 8/10
163/163 [==============================] - 0s 488us/step - loss: 0.9772 - mae: 0.7794
Epoch 9/10
163/163 [==============================] - 0s 464us/step - loss: 0.8716 - mae: 0.7259
Epoch 10/10
163/163 [==============================] - 0s 494us/step - loss: 0.8687 - mae: 0.7248
41/41 [==============================] - 0s 380us/step
...

在这里,损失函数的行为与第一次实验完全不同;从9.8205上升到0.8687

因为我是:

  1. 将所有随机种子设置为相同的值
seed = 7
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
  1. 使用相同的KFold方法:
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
  1. 神经网络确实具有相同的架构:
Model: "sequential_XXX"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_XXX (Dense)            (None, 64)                832       
_________________________________________________________________
dense_XXY (Dense)            (None, 1)                 65        
=================================================================
Total params: 897
Trainable params: 897
Non-trainable params: 0
  1. 纪元和批大小是相同的

我希望两个神经元得到相同的结果(至少在第一次折叠中),但在损失函数中得到不同的结果

第一个实验中的NN行为与第二个实验中的NN行为如何可能不同

编辑

问题是我在第一个实验中用X_训练,在第二个实验中用X训练。X_列车是X的缩放版本

尽管如此,马可关于种子的观点也适用。请参考他的答案


Tags: fromimportmodellayerstensorflowsteptrainrandom
1条回答
网友
1楼 · 发布于 2024-05-20 16:05:58

这仅仅是因为每次在每个折叠中构建新模型时,keras都会进行随机权重初始化。只需在顶部设置一次种子,下面的代码就可以重现,但取决于执行顺序

要使结果相同,您只需在每次安装新的折叠时初始化相同的种子。我们在create_model函数的顶部执行此操作,并使用它手动操作CV和KerasRegressor加上cross_val_score(从sklearn

def set_seed(seed):  
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)

def create_model(hidden_layers=1):
    
    set_seed(seed=7) # <==== set the seed at the beginning every time
    
    model = Sequential()
    for i in range(hidden_layers):
        if i == 0:
            model.add(Dense(64, input_dim=12, activation='relu'))
        else:
            model.add(Dense(64, activation='relu'))
    model.add(Dense(1))        
    model.compile(optimizer='rmsprop', loss='mse')
        
    return model

初始化一些虚拟数据

np.random.seed(7)
X = np.random.uniform(0,1, (1000,12))
Y = np.random.randint(0,2, (1000,))

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=7)

手工简历

for train, test in kfold.split(X, Y):
    model = create_model()
    model.fit(X[train], Y[train], epochs=1, verbose=1)
    mse_value = model.evaluate(X[test], Y[test], verbose=1)

结果:

25/25 [==============================] - 1s 1ms/step - loss: 0.2977
7/7 [==============================] - 0s 2ms/step - loss: 0.2508
25/25 [==============================] - 0s 1ms/step - loss: 0.2789
7/7 [==============================] - 0s 2ms/step - loss: 0.2696
25/25 [==============================] - 0s 979us/step - loss: 0.2760
7/7 [==============================] - 0s 2ms/step - loss: 0.2669
25/25 [==============================] - 0s 1ms/step - loss: 0.3076
7/7 [==============================] - 0s 2ms/step - loss: 0.2538
25/25 [==============================] - 0s 1ms/step - loss: 0.2807
7/7 [==============================] - 0s 2ms/step - loss: 0.2642

sklearnCV

model_wrapper = KerasRegressor(build_fn=create_model, epochs=1, verbose=1)
cross_val_score(model_wrapper, X, Y, cv=kfold)

结果:

25/25 [==============================] - 0s 1ms/step - loss: 0.2977
7/7 [==============================] - 0s 2ms/step - loss: 0.2508
25/25 [==============================] - 1s 1ms/step - loss: 0.2789
7/7 [==============================] - 0s 2ms/step - loss: 0.2696
25/25 [==============================] - 0s 1ms/step - loss: 0.2760
7/7 [==============================] - 0s 2ms/step - loss: 0.2669
25/25 [==============================] - 0s 1ms/step - loss: 0.3076
7/7 [==============================] - 0s 2ms/step - loss: 0.2538
25/25 [==============================] - 0s 1ms/step - loss: 0.2807
7/7 [==============================] - 0s 2ms/step - loss: 0.2642

here跑步笔记本

这仅对CPU有效

相关问题 更多 >