回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我试图用Keras复制<a href="http://neuralnetworksanddeeplearning.com/index.html" rel="nofollow noreferrer">Neural Networks and Deep Learning</a>中的一些示例,但是在基于第1章的体系结构训练网络时遇到了一些问题。目的是从MNIST数据集中对书写数字进行分类。
体系结构:</p>
<ul>
<li>784个输入(MNIST图像中的28*28像素各一个)</li>
<li>由30个神经元组成的隐藏层</li>
<li>10个神经元的输出层</li>
<li>权重和偏差从平均值为0和标准偏差为1的高斯分布中初始化。在</li>
<li>损失/成本函数为均方误差。在</li>
<li>优化器是随机梯度下降。在</li>
</ul>
<p>超参数:</p>
<ul>
<li>学习率=3.0</li>
<li>批量=10</li>
<li>时代=30</li>
</ul>
<p>我的代码:</p>
<pre class="lang-python prettyprint-override"><code>from keras.<a href="https://www.cnpython.com/pypi/dataset" class="inner-link">dataset</a>s import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.initializers import RandomNormal
# import data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# input image dimensions
img_rows, img_cols = 28, 28
x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)
x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)
input_shape = (img_rows * img_cols,)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('y_train shape:', y_train.shape)
# Construct model
# 784 * 30 * 10
# Normal distribution for weights/biases
# Stochastic Gradient Descent optimizer
# Mean squared error loss (cost function)
model = Sequential()
layer1 = Dense(30,
input_shape=input_shape,
kernel_initializer=RandomNormal(stddev=1),
bias_initializer=RandomNormal(stddev=1))
model.add(layer1)
layer2 = Dense(10,
kernel_initializer=RandomNormal(stddev=1),
bias_initializer=RandomNormal(stddev=1))
model.add(layer2)
print('Layer 1 input shape: ', layer1.input_shape)
print('Layer 1 output shape: ', layer1.output_shape)
print('Layer 2 input shape: ', layer2.input_shape)
print('Layer 2 output shape: ', layer2.output_shape)
model.summary()
model.compile(optimizer=SGD(lr=3.0),
loss='mean_squared_error',
metrics=['accuracy'])
# Train
model.fit(x_train,
y_train,
batch_size=10,
epochs=30,
verbose=2)
# Run on test data and output results
result = model.evaluate(x_test,
y_test,
verbose=1)
print('Test loss: ', result[0])
print('Test accuracy: ', result[1])
</code></pre>
<p>输出(使用Python3.6和TensorFlow后端):</p>
^{pr2}$
<p>(30个时代重复)</p>
<pre class="lang-python prettyprint-override"><code>Epoch 30/30
- 6s - loss: nan - acc: 0.0987
10000/10000 [==============================] - 0s 22us/step
Test loss: nan
Test accuracy: 0.098
</code></pre>
<p>正如你所看到的,网络根本没有学习,我不知道为什么。据我所知,这些形状看起来还不错。我在做什么阻止网络学习?在</p>
<p>(顺便说一句,我知道交叉熵损失和softmax输出层会更好;但是,从链接的书来看,它们似乎没有必要。这本书在第一章中手动实现的网络学习成功;在继续学习之前,我试图复制这一点。)</p>