use nn.Linear() The neural network layer is constructed more simply

The multi category problem before

It's handwritten on each layer

In fact, we can use it nn.Linear, You don't have to write

The first parameter here is in, The second parameter is out, It's in line with our normal thinking habits

 

With the activation function

 

 

If we want to implement a network structure of our own

No need to implement backward(),nn.Module It will be provided automatically ,pytorch Of autograd The package will automatically implement the function of backward derivation
class MLP(nn.Module): def __init__(self): super(MLP,self).__init__()
self.model = nn.Sequential( nn.Linear(784,200), nn.ReLU(inplace=True),
nn.Linear(200,200), nn.ReLU(inplace=True), nn.Linear(200,10),
nn.ReLU(inplace=True), ) def forward(self,x): x = self.model(x) return x
In the training data section
net = MLP() optimizer = optim.SGD(net.parameters(), lr=learning_rate) criteon
= nn.CrossEntropyLoss() for epoch in range(epochs): for batch_idx, (data,
target) in enumerate(train_loader): data = data.view(-1, 28*28) logits =
net(data) loss = criteon(logits, target) optimizer.zero_grad() loss.backward()
# print(w1.grad.norm(), w2.grad.norm()) optimizer.step()
We were in Deep learning and neural network ( Four ) This is what the optimizer wrote in the actual combat

It's a w,b Write a parameter of list

Now our net Inherited from nn.module, Will be able to w,b The parameter of is added automatically nn.parameters inside

 

Rewrite the previous multi classification problem
import torch import torch.nn as nn import torch.nn.functional as F import
torch.optim as optim from torchvision import datasets, transforms
batch_size=200 learning_rate=0.01 epochs=10 train_loader =
torch.utils.data.DataLoader( datasets.MNIST('dataset', train=True,
download=True, transform=transforms.Compose([ transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=batch_size,
shuffle=True) test_loader = torch.utils.data.DataLoader(
datasets.MNIST('dataset', train=False, transform=transforms.Compose([
transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])),
batch_size=batch_size, shuffle=True) class MLP(nn.Module): def __init__(self):
super(MLP, self).__init__() self.model = nn.Sequential( nn.Linear(784, 200),
nn.ReLU(inplace=True), nn.Linear(200, 200), nn.ReLU(inplace=True),
nn.Linear(200, 10), nn.ReLU(inplace=True), ) def forward(self, x): x =
self.model(x) return x net = MLP() optimizer = optim.SGD(net.parameters(),
lr=learning_rate) criteon = nn.CrossEntropyLoss() for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader): data = data.view(-1,
28*28) logits = net(data) loss = criteon(logits, target) optimizer.zero_grad()
loss.backward() # print(w1.grad.norm(), w2.grad.norm()) optimizer.step() if
batch_idx % 100 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss:
{:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. *
batch_idx / len(train_loader), loss.item())) test_loss = 0 correct = 0 for
data, target in test_loader: data = data.view(-1, 28 * 28) logits = net(data)
test_loss += criteon(logits, target).item() pred = logits.data.max(1)[1]
correct += pred.eq(target.data).sum() test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset), 100. * correct /
len(test_loader.dataset)))

train The effect is basically the same

 

In the previous method, we will have initialization problems , Not here

Here we are w and b The parameter of has been returned to nn.Linear() Managed , Not exposed to us , We can't initialize it directly

Then, when we use its interface, it has its own initialization method , We don't have to worry about it

 

Layers and structure of fully connected networks

Fully connected networks are also called linear layers

Calculate the number of layers , The input layer is not computed , Output layer to be calculated

So this network has 4 layer

And if you ask how many hidden layers there are 3 Yes

For a certain floor , We generally mean that the weight of this layer and the output of this layer are added together to call a layer

For the second layer, it means this

 

 

This network is used to process very simple data sets ——MNIST Of the image dataset

MNIST Each image in the dataset has 28*28 Pixels , So the input is 28*28;MNIST There are altogether 10 class , So the output is 10 Points

That is, the input is 28*28 Matrix of , In order to facilitate the processing of full connection layer , We'll make it even 784 Vector of layers , And then the intermediate nodes are all 256 Points

Let's calculate for such a network , How many parameters are needed

The parameter quantity of neural network is how many lines there are

784*256+256*256+256*256+256*10 = 390K

So the parameter is 390K

And then each parameter uses one 4 A floating-point number of bytes , therefore 390k*4 = 1.6MB

So we need to 1.6M Memory or video memory ( If you use GPU The words of )

 

That number now looks small
however MNIST It's in 80 It was born in the age , At that time, it was still 386 When
At that time, the processors were probably only tens to hundreds KB
For such a simple network , It doesn't fit into memory
 

 

 

Technology