- 2019-06-12 20:27
*views 6*- Softmax
- Deep learning and computer vision

1, basic content

The score of linear classification is transformed into probability value , Multi classification , stay SVM The output in is the score value ,Softmax The output of is probability .

2,Sigmoid function

expression （ The range is [0,1]）:

Function image ：

Sigmoid Function can map any real number to a probability value [0,1] On the interval , Classification is realized according to the size of probability value .

3,Softmax The output of

softmax function ： Its input value is a vector , The score value of any real number in the vector , Output a vector , Where the value of each element is in the 0 reach 1 between , And the sum of all elements is 1（ Normalized classification probability ：）：

loss function ： Cross entropy loss （cross-entropy loss）

among

The above classification of cats is also taken as an example for calculation ：

Power operations map relatively large values to larger values , Mapping negative numbers to very small numbers ,Li Is the loss function value （ The loss value is calculated for the probability value of the correct category ）

4,SVM and Softmax Comparison of loss functions of

about hinge loss, When the score of the wrong category is close to that of the correct category, the effect of the model cannot be accurately evaluated （ The loss value is close to 0, But the classification effect of the model is not good ）, Therefore, this kind of loss function is not used .

5, optimization ：

Input data and a set of weight parameters are combined to get a set of score values , In the end Loss value , This process is called forward propagation process . adopt Loss Value update weight parameter , There can be algorithmic implementation of back propagation

5.1 gradient descent （ Reach the lowest point as fast as possible ）

Gradient formula ：

Gradient descent code implementation ：

Bachsize（ Take a batch of data from the original data ） Usually 2 Integral multiple of （32,64,128）, Consider the load of the computer , Generally, the bigger the better .step_size For learning rate （ It's not easy to be too big ）.

When training network LOSS Value visualization results ：

Local fluctuation , But the overall trend is downward , It shows that the network is feasible .（epoch It refers to processing the whole data once , One iteration means only completion Bachsize Size of data processing ）

5.2 Back propagation

The picture above shows forward propagation , In turn, by L to update W It's called back propagation , Examples are as follows ：

Suppose there is x,y,z Three sample sites , After a series of operations, we get a loss value f, Now we need to calculate the weight parameter pairs corresponding to the sample points f What's your contribution （ Finding partial derivatives ）

Chain rule ：

The backward propagation process of more complex functions is as follows ：

Simplification ：

Meaning of door unit ：

Technology

- Python180 blogs
- Java171 blogs
- Vue92 blogs
- Flow Chart80 blogs
- algorithm54 blogs
- C++51 blogs
- javascript48 blogs
- MySQL46 blogs
- more...

Daily Recommendation

©2020-2021 ioDraw All rights reserved

Java project ： CET-4 and CET-6 online examination information website （java+ssm+mysql+maven）【iOS】 How to give UICollectionView add to headerView14:00 interview ,14:08 Just came out , The question is too much ...jquery Detailed explanation of method properties Shopping in Ali 8 In, he sorted out his study notes , Helped 10 A friend got it offer How to solve the synchronization delay problem of database read-write separation ? Predicting future stock trend with deep learning algorithm centos Which version works well _CentOS VS Ubuntu, Who is better Linux edition ? be based on LSTM Detailed explanation of stock forecasting algorithm In Ali 10 year Java My cousin came back from vacation , After chatting, I realized everything