pytorch教程之网络的构建流程笔记_Python

构建 网络

我们可以通过torch.nn包来构建网络，现在你已经看过了autograd，nn在autograd的基础上定义模型和求微分。一个nn.Module包括很多层，forward方法返回output。

一个典型的训练过程包括这么几步：
1.定义一个网络结构包含一些可训练的额参数
2.为数据集制定输入iterata
3.通过网络计算Output
4.计算loss
5.反向传播计算梯度
6.更新权值

				?

									weight = weight - learning_rate * gradient

定义一个网络

让我们来定义一个网络

				?

									import torch

									import torch as nn

									import torch.nn.functional as F

									class Net(nn.Module):

									    def __init__(self):

									        super(Net,self).__init__(

									        #1 input image channel ,6output image channel ,5*5convolytion kernel

									        self.conv1 = nn.Conv2d(1,6,5)

									        self.conv2 = nn.Conv2d(6,16,5)

									        # an affine operation:y = Wx+b

									        self.fc1 = nn.Linear(16*5*5,120)

									        self.fc2 = nn.Linear(120,84)

									        self.fc3 = nn.Linear(84,10)

									    def forward(self,x):

									        #max pooling

									        x.F.max_pool2d(F.relu(self.conv1(x)),(2,2))

									        #2   =    （2,2）

									        x.F.max_pool2d(F.relu(self.con2(x)),2)

									        x = x.view(-1,self.num_flat_features(x))

									        x = F.relu(self.fc1(x))

									        x = F.relu(self.fc2(x))

									        x = self.fc3(x)

									        return  x

									    def num_flat_features(self,x):

									        size = x.size()[1:]

									        num_feature = 1

									        for s in size:

									            num_features *=s

									        return num_features

									net = Net()

									print(net)

out

				?

									Net(

									  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

									  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))

									  (fc1): Linear(in_features=400, out_features=120, bias=True)

									  (fc2): Linear(in_features=120, out_features=84, bias=True)

									  (fc3): Linear(in_features=84, out_features=10, bias=True)

									)

我们只需定义forward和backward函数，会自动求导通过你定义的函数，你可以使用所有的Tensor操作在forward函数中。
我们使用net.parameters()函数返回可学习的参数

				?

									params = list(net.parameters())

									print(len(params))

									print(params[0].size())  # conv1's .weight

out

				?

									10

									torch.Size([6, 1, 5, 5])

让我们试试32*32的输入节点，因为lenet网络的输入应该是32*32，为了在MNIST数据集上使用lenet我们需要将图片reshpe成32*32

				?

									input = torch.randn(1,1,32,32)

									oyt = net(input)

									print(out)

out

				?

									tensor([[-0.1346,  0.0581, -0.0396, -0.1136, -0.1128,  0.0180, -0.1226,

									         -0.0419, -0.1150,  0.0278]])

零化导数buffers所有的参数都会随机求导

				?

									net.zero_grad()

									out.backward(torch.randn(1,10))

torch.nn只支持mini-batch，而不是单个的样本
例如，nn.Conv2d输入是一个4维tensors

				?

									nSamples * nChannels * Height * Width

如果你只有单个的样本，使用input.unsqueeze(0)增加一个假的batch维度
在后处理之前，让我们看看都学过什么类

Recap:

torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

目前，我们学习了:
1.定义一个神经网络
2.处理输入和使用后向传播
我们还需要学习:
1.计算loss
2.更新权值

loss Function

Loss function接受(output traget)对作为输入，计算一个反映到目标距离的值。
在nn这个包里面有很多loss function ，最简单的是nn.MSELoss,就是那输入与输出的均方误差。

举个例子

				?

									output = net(input)

									target = torch.arrange(1,11)

									target = target.view(1m-1)

									criterion = nn.MSELoss()

									loss = criterion(output,target)

									print(loss)

Out:

				?

									tensor(39.1076)

Backprop

为了反向传播我们需要做的仅仅是进行loss.backward()，我们需要清除现有的梯度

更新权值

最简单常用的更新权值的方法就是SGD（Stochastic Gradient Descent ）

				?

									weight = weight - learning_rata * gradiernt

我们可以通过简单的代码实现上面的公式:

				?

									learning_rata = 0.01

									for f in net.parameters():

									    f.data.sib_(f.grad.data *  learining_rata)

但是我们也可以使用不同的更新规则，像是 SGD, Nesterov-SGD, Adam, RMSProp, etc.
为了使用这些，我们需要torch.optim包，使用起来也很简单。

				?

									import torch.optim as optim 

									#creat you optimizer

									optimizer = optim.SGD(net.parameters(),lr = 0.01)

									#in your training loop:

									optimizer.zero_grad()

									output = net(input)

									loss = criterion(output,target)

									loss.backward()

									optimizer.step()

注意gradient必须清零
现在我们调用loss.backward()，并且看看con1的bias的前后差别

				?

									ner.zero_grad()

									print('conv1.bias.grad before backward')

									loss.backward()

									print('conv1.bias.grad after backward')

									piint(net.conv1.bias.grad)

out

				?

									conv1.bias.grad before backward

									tensor([ 0.,  0.,  0.,  0.,  0.,  0.])

									conv1.bias.grad after backward

									tensor([ 0.1178, -0.0404, -0.0810,  0.0363, -0.0631,  0.1423])