我正在尝试实现关于强化学习的this gist的TensorFlow版本。基于评论,它使用来自logits的二进制交叉熵。我尝试使用tf.keras.losses.binary_crossentropy
,但在相同的输入和初始权重下,它会产生完全不同的梯度。在训练期间,tensorflow版本的表现很糟糕,而且根本没有学习,所以它肯定有问题,但无法找出原因。
看看我做的测试:
x_size = 2
h_size = 3
y_size = 1
rms_discount = 0.99
epsilon = 1e-7
learning_rate = 0.001
x = np.arange(x_size).astype('float32').reshape([1, -1])
y = np.zeros([1, y_size]).astype('float32')
r = np.ones([1, 1]).astype('float32')
wh1 = np.arange(x_size * h_size).astype('float32').reshape([x_size, h_size])
wy1 = np.arange(h_size * y_size).astype('float32').reshape([h_size, y_size])
cache_wh1 = np.zeros_like(wh1)
cache_wy1 = np.zeros_like(wy1)
optimizer = tf.keras.optimizers.RMSprop(learning_rate, rms_discount, epsilon=epsilon)
wh2 = tf.keras.layers.Dense(
h_size,
'relu',
False,
tf.keras.initializers.constant(wh1)
)
wy2 = tf.keras.layers.Dense(
y_size,
None,
False,
tf.keras.initializers.constant(wy1)
)
cache_wh2 = np.zeros_like(wh1)
cache_wy2 = np.zeros_like(wy1)
for i in range(100):
h1 = np.matmul(x, wh1)
h1[h1 < 0] = 0.
y_pred1 = np.matmul(h1, wy1)
dCdy = -(y - y_pred1)
dCdwy = np.matmul(h1.T, dCdy)
dCdh = np.matmul(dCdy, wy1.T)
dCdh[h1 < 0] = 0
dCdwh = np.matmul(x.T, dCdh)
gradients1 = [dCdwh, dCdwy]
cache_wh1 = rms_discount * cache_wh1 + (1 - rms_discount) * dCdwh**2
wh1 -= learning_rate * dCdwh / (np.sqrt(cache_wh1) + epsilon)
cache_wy1 = rms_discount * cache_wy1 + (1 - rms_discount) * dCdwy**2
wy1 -= learning_rate * dCdwy / (np.sqrt(cache_wy1) + epsilon)
with tf.GradientTape() as tape:
h2 = wh2(x)
y_pred2 = wy2(h2)
loss = tf.keras.losses.binary_crossentropy(y, y_pred2, from_logits=True)
gradients2 = tape.gradient(loss, wh2.trainable_variables + wy2.trainable_variables)
cache_wh2 = rms_discount * cache_wh2 + (1 - rms_discount) * gradients2[0]**2
wh2.set_weights(wh2.get_weights() - learning_rate * gradients2[0] / (np.sqrt(cache_wh2) + epsilon))
cache_wy2 = rms_discount * cache_wy2 + (1 - rms_discount) * gradients2[1]**2
wy2.set_weights(wy2.get_weights() - learning_rate * gradients2[1] / (np.sqrt(cache_wy2) + epsilon))
print('1', gradients1[0])
print('1', gradients1[1])
print('2', gradients2[0])
print('2', gradients2[1])
成本/损失相对于y(pred)的偏导数是相同的,因此剩余的应该是标准的反向传播,只是使用RMSprop。但他们的表现不同。为什么?
目前没有回答
相关问题 更多 >
编程相关推荐