GradientTape根据损耗函数是否由tf.function修饰给出不同的梯度问题的回答

GradientTape根据损耗函数是否由tf.function修饰给出不同的梯度

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我发现计算的梯度取决于tf.function decorators之间的相互作用，如下所示 首先，我为二进制分类创建一些合成数据 <pre><code>tf.random.set_seed(42) np.random.seed(42) x=tf.random.normal((2,1)) y=tf.constant(np.random.choice([0,1],2)) </code></pre> 然后我定义了两个仅在tf.function decorator中不同的损失函数 <pre><code>weights=tf.constant([1.,.1])[tf.newaxis,...] def customloss1(y_true,y_pred,sample_weight=None): y_true_one_hot=tf.one_hot(tf.cast(y_true,tf.uint8),2) y_true_scale=tf.multiply(weights,y_true_one_hot) return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true_scale,y_pred)) @tf.function def customloss2(y_true,y_pred,sample_weight=None): y_true_one_hot=tf.one_hot(tf.cast(y_true,tf.uint8),2) y_true_scale=tf.multiply(weights,y_true_one_hot) return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true_scale,y_pred)) </code></pre> 然后我做了一个非常简单的逻辑回归模型，去掉了所有的杂音以保持简单 <pre><code>tf.random.set_seed(42) np.random.seed(42) model=tf.keras.Sequential([ tf.keras.layers.Dense(2,use_bias=False,activation='softmax',input_shape=[1,]) ]) </code></pre> 最后定义两个函数来计算上述损失函数的梯度，一个用tf.function修饰，另一个不用tf.function修饰 <pre><code>def get_gradients1(x,y): with tf.GradientTape() as tape1: p1=model(x) l1=customloss1(y,p1) with tf.GradientTape() as tape2: p2=model(x) l2=customloss2(y,p2) gradients1=tape1.gradient(l1,model.trainable_variables) gradients2=tape2.gradient(l2,model.trainable_variables) return gradients1, gradients2 @tf.function def get_gradients2(x,y): with tf.GradientTape() as tape1: p1=model(x) l1=customloss1(y,p1) with tf.GradientTape() as tape2: p2=model(x) l2=customloss2(y,p2) gradients1=tape1.gradient(l1,model.trainable_variables) gradients2=tape2.gradient(l2,model.trainable_variables) return gradients1, gradients2 </code></pre> 现在当我跑的时候 <pre><code>get_gradients1(x,y) </code></pre> 我明白了 <pre><code>([<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.11473544, -0.11473544]], dtype=float32)>], [<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.11473544, -0.11473544]], dtype=float32)>]) </code></pre> 并且梯度与预期相同。但是当我跑的时候 <pre><code>get_gradients2(x,y) </code></pre> 我明白了 <pre><code>([<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.02213785, -0.5065186 ]], dtype=float32)>], [<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.11473544, -0.11473544]], dtype=float32)>]) </code></pre> 只有第二个答案是正确的。因此，当我的外部函数被修饰时，我只能从被修饰的内部函数得到正确的答案。我的印象是，装饰外部循环（在许多应用程序中是训练循环）就足够了，但在这里我们看到它不是。我想了解为什么，以及一个人需要花多深的时间来装饰正在使用的功能 添加了一些调试信息 我添加了一些调试信息，并且只显示customloss2的代码（另一个相同） <pre><code>@tf.function def customloss2(y_true,y_pred,sample_weight=None): y_true_one_hot=tf.one_hot(tf.cast(y_true,tf.uint8),2) y_true_scale=tf.multiply(weights,y_true_one_hot) tf.print('customloss2',type(y_true_scale),type(y_pred)) tf.print('y_true_scale','\n',y_true_scale) tf.print('y_pred','\n',y_pred) return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true_scale,y_pred)) </code></pre> 在跑步时，我得到了梯度1 <pre><code>customloss1 <type 'EagerTensor'> <type 'EagerTensor'> y_true_scale [[1 0] [0 0.1]] y_pred [[0.510775387 0.489224613] [0.529191136 0.470808864]] customloss2 <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'> y_true_scale [[1 0] [0 0.1]] y_pred [[0.510775387 0.489224613] [0.529191136 0.470808864]] </code></pre> 我们看到customloss1的张量是急切的，而customloss2的张量是张量，但我们得到的梯度值是相同的 另一方面，当我在get_gradients2上运行它时 <pre><code>customloss1 <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'> y_true_scale [[1 0] [0 0.1]] y_pred [[0.510775387 0.489224613] [0.529191136 0.470808864]] customloss2 <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'> y_true_scale [[1 0] [0 0.1]] y_pred [[0.510775387 0.489224613] [0.529191136 0.470808864]] </code></pre> 我们看到一切都是一样的，没有张量，但我得到了不同的梯度

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

GradientTape根据损耗函数是否由tf.function修饰给出不同的梯度

1 个回答

相关Python问题