如何调整我的神经网络以避免过度拟合mnist数据集?

2024-10-04 09:23:00 发布

您现在位置:Python中文网/ 问答频道 /正文

!!!!!!!!!TL;博士在底部

试图学习ML的IN和OUT,我一直在C++中实现神经网络优化器,并用SWIG将其封装为Python模块。当然,我解决的第一个问题是通过以下代码片段实现XOR:2个输入层,2个隐藏层,1个输出层

from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import time

#=======================================================
# Training Set
#=======================================================

X = [[0,1],[1,0],[1,1],[0,0]]
Y = [[1],[1],[0],[0]]

nIn = len(X[0])
nOut = len(Y[0])

#=======================================================
# Model
#=======================================================
verbosity = 0

#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,2,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)

#Initialize the classification optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X,Y)
Opt.setLoggerVerbosity(verbosity)

start_time = time.time();

#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10000,0.1)
print("--- %s seconds ---" % (time.time() - start_time))

#Make a prediction
print(Opt.predict(X))

这段代码将产生以下输出(正确答案为[1,1,0,0])

--- 0.10273098945617676 seconds ---
((0.9398755431175232,), (0.9397522211074829,), (0.0612373948097229,), (0.04882470518350601,))
>>>

看起来很棒! 现在来谈谈这个问题。下面的代码片段试图从mnist数据集中学习,但显然存在过度拟合的问题~750输入(28X28像素),50隐藏,10输出

from MikeLearn import NeuralNetwork
from MikeLearn import ClassificationOptimizer
import matplotlib.pyplot as plt
import numpy as np
import pickle
import time

#=======================================================
# Data Set
#=======================================================

#load the data dictionary
modeldata = pickle.load( open( "mnist_data.p", "rb" ) )
X = modeldata['X']
Y = modeldata['Y']

#normalize data
X = np.array(X)
X = X/255
X = X.tolist()

#training set
X1 = X[0:49999]
Y1 = Y[0:49999]

#validation set
X2 = X[50000:59999]
Y2 = Y[50000:59999]

#number of inputs/outputs
nIn = len(X[0]) #~750
nOut = len(Y[0]) #=10

#=======================================================
# Model
#=======================================================
verbosity = 1

#Initualize neural network
# NeuralNetwork([nInputs, nHidden1, nHidden2,..,nOutputs],['Activation1','Activation2'...]
N = NeuralNetwork([nIn,50,nOut],['sigmoid','sigmoid'])
N.setLoggerVerbosity(verbosity)

#Initialize optimizer
#ClassificationOptimizer(NeuralNetwork,Xtrain,Ytrain)
Opt = ClassificationOptimizer(N,X1,Y1)
Opt.setLoggerVerbosity(verbosity)

start_time = time.time();
#fit data
#fit(nEpoch,LearningRate)
E = Opt.fit(10,0.1)
print("--- %s seconds ---" % (time.time() - start_time))

#================================
#Final Accuracy on training set
#================================
XL = Opt.predict(X1)

correct = 0
for i,x in enumerate(XL):
    if XL[i].index(max(XL[i])) == Y[i].index(max(Y1[i])):
        correct = correct + 1

print("Training set Correct = " +  str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy) + '%')

#================================
#Final Accuracy on validation set
#================================
XL = Opt.predict(X2)

correct = 0
for i,x in enumerate(XL):
    if XL[i].index(max(XL[i])) == Y[i].index(max(Y2[i])):
        correct = correct + 1

print("Testing set Correct = " +  str(correct))
Accuracy = correct/len(XL)*100;
print("Accuracy = " + str(Accuracy)+'%')

这段代码将产生以下输出,显示训练精度和验证精度

-------------------------
Epoch
9
-------------------------
E=
0.00696964
E=
0.350509
E=
3.49568e-05
E=
4.09073e-06
E=
1.38491e-06
E=
0.229873
E=
3.60186e-05
E=
0.000115187
E=
2.29978e-06
E=
2.69165e-06
--- 27.400235176086426 seconds ---
Training set Correct = 48435
Accuracy = 96.87193743874877%
Testing set Correct = 982
Accuracy = 9.820982098209821%

训练集的准确度很高,但测试集并不比随机猜测好多少。你知道这是什么原因吗

TL;博士

  1. 用模型2输入、2隐藏、1输出和sigmoid激活函数求解XOR。很好的结果
  2. 尝试使用750个输入(28X28像素)、50个隐藏、10个输出和sigmoid激活函数的模型来解决Mnist数据集。严重的过度装配问题。训练集的准确率为95%,验证集的准确率为10%

你知道这是什么原因吗


Tags: importdatalentimefitprintoptset
1条回答
网友
1楼 · 发布于 2024-10-04 09:23:00

过度拟合的原因是数据和模型(本例中为网络)的组合。在培训过程中,is“懒惰”,发现数据的某些方面在培训数据中工作得很好,但不能很好地概括

很难/不可能准确指出在经过训练的网络中负责过拟合的节点/权重的位置

但我们可以通过以下几个技巧避免过度装配:

  1. 正规化
  2. 辍学(更容易实施)
  3. 更改网络体系结构(更少的层/更少的节点/更多的维度缩减)

https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/

要了解正则化的概念,请尝试tensorflow中的游乐场:

https://playground.tensorflow.org/

辍学的想象

https://yusugomori.com/projects/deep-learning/dropout-relu

除了尝试正则化技术,还对不同的神经网络结构进行了实验

相关问题 更多 >