<p>你知道吗
[更新]
在第二个示例中,您没有正确地将<code>y1.grad</code>传递到<code>y1.backward</code>。在第一次<code>backward</code>之后,所有中间梯度都将被销毁,您需要一个特殊的钩子来提取该梯度。在您的例子中,您传递的是<code>None</code>值。下面是一个小例子来重现您的案例:</p>
<p>代码:</p>
<pre class="lang-py prettyprint-override"><code>import torch
import torch.nn as nn
torch.manual_seed(42)
class Model1(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x.pow(3)
class Model2(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x / 2
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
# We are going to backprop 2 times, so we need to
# retain_graph=True while first backward
loss.backward(retain_graph=True)
try:
y1.backward(y1.grad)
except RuntimeError as err:
print(err)
print('y1.grad: ', y1.grad)
</code></pre>
<p>输出:</p>
<pre class="lang-py prettyprint-override"><code>grad can be implicitly created only for scalar outputs
y1.grad: None
</code></pre>
<p>因此,您需要正确提取它们:</p>
<p>代码:</p>
<pre class="lang-py prettyprint-override"><code>def extract(V):
"""Gradient extractor.
"""
def hook(grad):
V.grad = grad
return hook
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
y1.register_hook(extract(y1))
loss.backward(retain_graph=True)
print('y1.grad', y1.grad)
y1.backward(y1.grad)
</code></pre>
<p>输出:</p>
<pre class="lang-py prettyprint-override"><code>y1.grad: tensor([[-0.1763, -0.2114, -0.0266, -0.3293, 0.0534]])
</code></pre>