If the data set is too small, no matter how it is processed , There are always fitting problems , So that the accuracy will not be so high , So it's time to introduce a pre trained model , The pre training model is usually trained by a large number of data , And the characteristic is that the selected pre training model is similar to the existing problems .
A simple and old model is used here vgg16, This model is very similar to the architecture we used before . There are two ways to use pre trained networks , Feature extraction and fine tuning model .
first , In the previous study , Convolution neural network consists of two parts , The first part is composed of convolution layer and pool layer , The second part is an expanded classifier , The first part simply calls it convolution basis . The latter classifier is used to classify the trained model , Its own universality is not so strong , This simple thinking will understand , So convolution basis is usually used for reuse .
There is also the similarity of the problem . Logically speaking , The less layers , The simpler the characteristics of training are . So the higher the similarity , The more layers are available for existing models , If the similarity between the new problem and the existing model is not high , It can be considered to use fewer layers of existing model structure .
Now we can see the specific implementation process .
from keras.applications import VGG16 conv_base = VGG16(weights='imagenet',
include_top=False, input_shape=(150, 150, 3))
Import vgg16, Setting convolution basis parameters , among weights:None Represents random initialization , That is, the pre training weight is not loaded .'imagenet’ Representative load pre training weight
;include_top Represents whether to use the second part classifier , It's not used here . Because it's a dog and cat problem , Just add it yourself ; Finally, the input size .
use summary Methods take a look at the network architecture , I found it familiar ：
<>Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 150, 150, 3) 0
block1_conv1 (Conv2D) (None, 150, 150, 64) 1792
block1_conv2 (Conv2D) (None, 150, 150, 64) 36928
block1_pool (MaxPooling2D) (None, 75, 75, 64) 0
block2_conv1 (Conv2D) (None, 75, 75, 128) 73856
block2_conv2 (Conv2D) (None, 75, 75, 128) 147584
block2_pool (MaxPooling2D) (None, 37, 37, 128) 0
block3_conv1 (Conv2D) (None, 37, 37, 256) 295168
block3_conv2 (Conv2D) (None, 37, 37, 256) 590080
block3_conv3 (Conv2D) (None, 37, 37, 256) 590080
block3_pool (MaxPooling2D) (None, 18, 18, 256) 0
block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160
block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808
block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808
block4_pool (MaxPooling2D) (None, 9, 9, 512) 0
block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808
block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808
block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808
<>block5_pool (MaxPooling2D) (None, 4, 4, 512) 0
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
The final output is 4,4,512, So we add a dense link classifier on this basis , We can solve the problem of two classification , But now the problem is to optimize the model .
This model cannot be enhanced with data , To use it, you have to add one at the top of the model dense layer , This will lead to a significant increase in the computational cost of model training . The two methods of not using and using data enhancement will be analyzed later .