Pythorch参数未更改

import torch def make_covariance_matrix(sigma, rho): return torch.tensor([[sigma[0]**2, rho * torch.prod(sigma)], [rho * torch.prod(sigma), sigma[1]**2]]) mu_true = torch.randn(2) rho_true = torch.rand(1) sigma_true = torch.exp(torch.rand(2)) cov_true = make_covariance_matrix(sigma_true, rho_true) dist_true = torch.distributions.MultivariateNormal(mu_true, cov_true) samples = dist_true.sample((1_000,)) mu = torch.zeros(2, requires_grad=True) log_sigma = torch.zeros(2, requires_grad=True) atanh_rho = torch.zeros(1, requires_grad=True) lbfgs = torch.optim.LBFGS([mu, log_sigma, atanh_rho]) def closure(): lbfgs.zero_grad() sigma = torch.exp(log_sigma) rho = torch.tanh(atanh_rho) cov = make_covariance_matrix(sigma, rho) dist = torch.distributions.MultivariateNormal(mu, cov) loss = -torch.mean(dist.log_prob(samples)) loss.backward() return loss lbfgs.step(closure) print("mu: {}, mu_hat: {}".format(mu_true, mu)) print("sigma: {}, sigma_hat: {}".format(sigma_true, torch.exp(log_sigma))) print("rho: {}, rho_hat: {}".format(rho_true, torch.tanh(atanh_rho)))

1条回答

网友

1楼 · 发布于 2024-09-28 22:30:17

创建协方差矩阵的方法不可回溯：

def make_covariance_matrix(sigma, rho):
    return torch.tensor([[sigma[0]**2, rho * torch.prod(sigma)],
                         [rho * torch.prod(sigma), sigma[1]**2]])

从（多个）张量创建新张量时，只保留输入张量的值。所有来自输入张量的附加信息都被剥离了，因此与参数的所有图连接都从这一点切断，因此反向传播无法通过。在

下面是一个简短的例子来说明这一点：

^{pr2}$

输出：

Original parameter 1:
tensor([ 0.8913]) True
Original parameter 2:
tensor([ 0.4785]) True
New tensor form params:
tensor([ 0.8913,  0.4785]) False

如您所见，由参数param1和param2创建的张量并不跟踪param1和{}的梯度。在

因此您可以使用以下代码来保持图形连接，并且可以回溯到中：

def make_covariance_matrix(sigma, rho):
    conv = torch.cat([(sigma[0]**2).view(-1), rho * torch.prod(sigma), rho * torch.prod(sigma), (sigma[1]**2).view(-1)])
    return conv.view(2, 2)

这些值用torch.cat连接到平面张量。然后使用view()将它们变成正确的形状。
这将产生与函数中相同的矩阵输出，但它保持与参数log_sigma和{}的连接。在

以下是步骤前后的输出，其中make_covariance_matrix已更改。如您所见，现在您可以优化参数，并且值会发生更改：

Before:
mu: tensor([ 0.1191,  0.7215]), mu_hat: tensor([ 0.,  0.])
sigma: tensor([ 1.4222,  1.0949]), sigma_hat: tensor([ 1.,  1.])
rho: tensor([ 0.2558]), rho_hat: tensor([ 0.])

After:
mu: tensor([ 0.1191,  0.7215]), mu_hat: tensor([ 0.0712,  0.7781])
sigma: tensor([ 1.4222,  1.0949]), sigma_hat: tensor([ 1.4410,  1.0807])
rho: tensor([ 0.2558]), rho_hat: tensor([ 0.2235])

希望这有帮助！在

相关问题更多 >

编程相关推荐

热门问题

热门文章