<p>欢迎来到pytorch</p>
<p>以下是我如何安排您的培训。请检查评论</p>
<pre><code># how the comunity usually does the import:
import torch # some people do: import torch as th
import torch.nn as nn
import torch.optim as optim
if __name__ == '__main__':
# setting some parameters:
batch_size = 32
n_dims = 128
# select GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# initializing a simple neural net
net = nn.Sequential(nn.Linear(n_dims, n_dims // 2), # Batch norm is not usually used directly on the input
nn.BachNorm1d(n_dims // 2), # Batch norm is used before the activation function (it centers the input and helps make the dims of the previous layers independent of each other)
nn.ReLU(), # the most common activation function
nn.Linear(n_dims // 2, 1) # final layer)
net.to(device) # model is copied to the GPU if it is availalbe
optimizer = to.SGD(net.parameters(), lr=0.01) # it is better to start with a low lr and increase it at later experiments to avoid training divergence, the range [1.e-6, 5.e-2] is recommended.
for i in range(10):
# generating random data:
board = torch.rand([batch_size, n_dims])
# for sequences: [batch_size, channels, L]
# for image data: [batch_size, channels, W, H]
# for videos: [batch_size, chanels, L, W, H]
boad = board.to(device) # data is copied to the gpu if it is available
optimizer.zero_grad() # the convension the comunity uses, though the result is the same as net.zero_grad()
nn_outputs = net(board) # don't call net.forward(x), call net(x). Pytorch applies some hooks in the net.__call__(x) that are useful for backpropagation.
loss = ((nn_outputs - 1)**2).mean() # using .mean() makes your training less sensitive to the batch size.
print(i, nn_outputs, loss.item())
loss.backward()
optimizer.step()
</code></pre>
<p>关于批处理规范的一条评论。对于每个维度,它计算批次的平均值和标准偏差(查看文档<a href="https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html#torch.nn.BatchNorm2d" rel="nofollow noreferrer">https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html#torch.nn.BatchNorm2d</a>):</p>
<pre><code>x_normalized = (x.mean(dim=0) / (x.std(dim=0) + e-6)) * scale + shift
</code></pre>
<p>其中,缩放和平移是可学习的参数。如果每个批只给出一个示例,<code>x.std(0) = 0</code>将使<code>x_normalized</code>包含非常大的值</p>