Deep learning 5： Using the pre training model （1） - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

If the data set is too small, no matter how it is processed , There are always fitting problems , So that the accuracy will not be so high , So it's time to introduce a pre trained model , The pre training model is usually trained by a large number of data , And the characteristic is that the selected pre training model is similar to the existing problems .
A simple and old model is used here vgg16, This model is very similar to the architecture we used before . There are two ways to use pre trained networks , Feature extraction and fine tuning model .

first , In the previous study , Convolution neural network consists of two parts , The first part is composed of convolution layer and pool layer , The second part is an expanded classifier , The first part simply calls it convolution basis . The latter classifier is used to classify the trained model , Its own universality is not so strong , This simple thinking will understand , So convolution basis is usually used for reuse .

There is also the similarity of the problem . Logically speaking , The less layers , The simpler the characteristics of training are . So the higher the similarity , The more layers are available for existing models , If the similarity between the new problem and the existing model is not high , It can be considered to use fewer layers of existing model structure .
Now we can see the specific implementation process .
from keras.applications import VGG16 conv_base = VGG16(weights='imagenet',
include_top=False, input_shape=(150, 150, 3))
Import vgg16, Setting convolution basis parameters , among weights:None Represents random initialization , That is, the pre training weight is not loaded .'imagenet’ Representative load pre training weight
;include_top Represents whether to use the second part classifier , It's not used here . Because it's a dog and cat problem , Just add it yourself ; Finally, the input size .
use summary Methods take a look at the network architecture , I found it familiar ：

<>Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 150, 150, 3) 0

block1_conv1 (Conv2D) (None, 150, 150, 64) 1792

block1_conv2 (Conv2D) (None, 150, 150, 64) 36928

block1_pool (MaxPooling2D) (None, 75, 75, 64) 0

block2_conv1 (Conv2D) (None, 75, 75, 128) 73856

block2_conv2 (Conv2D) (None, 75, 75, 128) 147584

block2_pool (MaxPooling2D) (None, 37, 37, 128) 0

block3_conv1 (Conv2D) (None, 37, 37, 256) 295168

block3_conv2 (Conv2D) (None, 37, 37, 256) 590080

block3_conv3 (Conv2D) (None, 37, 37, 256) 590080

block3_pool (MaxPooling2D) (None, 18, 18, 256) 0

block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160

block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808

block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808

block4_pool (MaxPooling2D) (None, 9, 9, 512) 0

block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808

block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808

block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808

<>block5_pool (MaxPooling2D) (None, 4, 4, 512) 0

Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
The final output is 4,4,512, So we add a dense link classifier on this basis , We can solve the problem of two classification , But now the problem is to optimize the model .

This model cannot be enhanced with data , To use it, you have to add one at the top of the model dense layer , This will lead to a significant increase in the computational cost of model training . The two methods of not using and using data enhancement will be analyzed later .

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...