Tensorflow如何知道要在minimize（）中使用哪个导数函数？

model9 = tf.nn.relu(tf.matmul(x1,w9)+b) model10 = tf.nn.sigmoid(tf.matmul(model9,w10)+b) error = tf.reduce_mean(tf.square(model10-y)) train = tf.train.AdamOptimizer(learning_rate=0.001).minimize(error)

2条回答

网友

1楼 · 编辑于 2024-09-24 10:23:01

是的，您的代码正在构建一个由表示操作和变量的节点组成的TensorFlow计算图。由于TensorFlow知道每个操作的梯度（即操作输出相对于其每个输入的梯度），它可以使用反向传播算法在梯度下降过程中更新变量，同时对每个激活函数应用正确的导数。请看下面对反向传播的精彩解释：http://cs224d.stanford.edu/lecture_notes/notes3.pdf

对于每个层使用不同的学习率，这不是那么简单，但是你可以通过将最小化调用分成两个组成部分来实现这一点：计算_梯度并应用_梯度，然后修改梯度以有效地改变学习速率。像这样：

model9 = tf.nn.relu(tf.matmul(x1,w9)+b)
model10 = tf.nn.sigmoid(tf.matmul(model9,w10)+b)
error = tf.reduce_mean(tf.square(model10-y))

optimiser = tf.train.AdamOptimizer(learning_rate=0.001)    
gradients = optimiser.compute_gradients(error, [w9, w10]) # Compute the gradients of error with respect to w9 and w10

# gradients is a list of tuples [(gradient, variable)]
gradients[0][0] *= 10  # Multiply the gradient of w9 by 10 to increase the learning rate
train = optimiser.apply_gradients(gradients)  # New train op

网友

2楼 · 编辑于 2024-09-24 10:23:01

Is tensorflow really so smart, it would 'iterate' through all layers and check activaction function and apply gradient decent based on activation function derivative?

是的。这就是使用张量流的全部意义。在

相关问题更多 >

编程相关推荐

热门问题

热门文章