Deep learning and neural network ( Five )—— Full connection layer - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

use nn.Linear() The neural network layer is constructed more simply

The multi category problem before

It's handwritten on each layer

In fact, we can use it nn.Linear, You don't have to write

The first parameter here is in, The second parameter is out, It's in line with our normal thinking habits

With the activation function

If we want to implement a network structure of our own

No need to implement backward(),nn.Module It will be provided automatically ,pytorch Of autograd The package will automatically implement the function of backward derivation
class MLP(nn.Module): def __init__(self): super(MLP,self).__init__()
self.model = nn.Sequential( nn.Linear(784,200), nn.ReLU(inplace=True),
nn.Linear(200,200), nn.ReLU(inplace=True), nn.Linear(200,10),
nn.ReLU(inplace=True), ) def forward(self,x): x = self.model(x) return x
In the training data section
net = MLP() optimizer = optim.SGD(net.parameters(), lr=learning_rate) criteon
= nn.CrossEntropyLoss() for epoch in range(epochs): for batch_idx, (data,
target) in enumerate(train_loader): data = data.view(-1, 28*28) logits =
net(data) loss = criteon(logits, target) optimizer.zero_grad() loss.backward()
# print(w1.grad.norm(), w2.grad.norm()) optimizer.step()
We were in Deep learning and neural network ( Four ) This is what the optimizer wrote in the actual combat

It's a w,b Write a parameter of list

Now our net Inherited from nn.module, Will be able to w,b The parameter of is added automatically nn.parameters inside

Rewrite the previous multi classification problem
import torch import torch.nn as nn import torch.nn.functional as F import
torch.optim as optim from torchvision import datasets, transforms
batch_size=200 learning_rate=0.01 epochs=10 train_loader =
torch.utils.data.DataLoader( datasets.MNIST('dataset', train=True,
download=True, transform=transforms.Compose([ transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=batch_size,
shuffle=True) test_loader = torch.utils.data.DataLoader(
datasets.MNIST('dataset', train=False, transform=transforms.Compose([
transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])),
batch_size=batch_size, shuffle=True) class MLP(nn.Module): def __init__(self):
super(MLP, self).__init__() self.model = nn.Sequential( nn.Linear(784, 200),
nn.ReLU(inplace=True), nn.Linear(200, 200), nn.ReLU(inplace=True),
nn.Linear(200, 10), nn.ReLU(inplace=True), ) def forward(self, x): x =
self.model(x) return x net = MLP() optimizer = optim.SGD(net.parameters(),
lr=learning_rate) criteon = nn.CrossEntropyLoss() for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader): data = data.view(-1,
28*28) logits = net(data) loss = criteon(logits, target) optimizer.zero_grad()
loss.backward() # print(w1.grad.norm(), w2.grad.norm()) optimizer.step() if
batch_idx % 100 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss:
{:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. *
batch_idx / len(train_loader), loss.item())) test_loss = 0 correct = 0 for
data, target in test_loader: data = data.view(-1, 28 * 28) logits = net(data)
test_loss += criteon(logits, target).item() pred = logits.data.max(1)[1]
correct += pred.eq(target.data).sum() test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset), 100. * correct /
len(test_loader.dataset)))

train The effect is basically the same

In the previous method, we will have initialization problems , Not here

Here we are w and b The parameter of has been returned to nn.Linear() Managed , Not exposed to us , We can't initialize it directly

Then, when we use its interface, it has its own initialization method , We don't have to worry about it

Layers and structure of fully connected networks

Fully connected networks are also called linear layers

Calculate the number of layers , The input layer is not computed , Output layer to be calculated

So this network has 4 layer

And if you ask how many hidden layers there are 3 Yes

For a certain floor , We generally mean that the weight of this layer and the output of this layer are added together to call a layer

For the second layer, it means this

This network is used to process very simple data sets ——MNIST Of the image dataset

MNIST Each image in the dataset has 28*28 Pixels , So the input is 28*28;MNIST There are altogether 10 class , So the output is 10 Points

That is, the input is 28*28 Matrix of , In order to facilitate the processing of full connection layer , We'll make it even 784 Vector of layers , And then the intermediate nodes are all 256 Points

Let's calculate for such a network , How many parameters are needed

The parameter quantity of neural network is how many lines there are

784*256+256*256+256*256+256*10 = 390K

So the parameter is 390K

And then each parameter uses one 4 A floating-point number of bytes , therefore 390k*4 = 1.6MB

So we need to 1.6M Memory or video memory ( If you use GPU The words of )

That number now looks small
however MNIST It's in 80 It was born in the age , At that time, it was still 386 When
At that time, the processors were probably only tens to hundreds KB
For such a simple network , It doesn't fit into memory

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...