<> one ,RNN network

<>1,Pytorch In RNN Parameter details

rnn = nn.RNN(*arg,**kwargs)
(1)input_size: input x t x_t xt​ Dimensions of
(2)hidden_size: output h t h_t ht​ Dimensions of
(3)num_layers: The number of layers of the network , Default to 1 layer
(4)nonlinearity: Nonlinear activation function , The default is tanh, You can also choose relu etc.
(5)bias: Is there a bias . Default to True
(6)batch
first: Determine the dimension order of network input , Default to (seq,batch,feature), If the parameter is set to True, The order becomes (batch,seq,feature).RNN
batch In the second dimension ,CNN batch In the first dimension .
(7)dropout: Accept one 0-1 Values between , In addition to the last layer in the network output layer plus dropout layer .
(8)bidirectional: Default to False, Representation of single cyclic neural networks ; If set to True, It's a two-way recurrent neural network .

<>2, input , Dimensions of output

(1) A sequence input after the network section x t x_t xt​ And memory input h 0 h_0 h0​. x t x_t xt​ What is the dimension (seq,batch,feature);
h 0 h_0h0​ Dimensions of implicit state ( l a y e r s ∗ d i r e c t i o n , b a t c h , h i d d e n )
(layers*direction,batch,hidden)(layers∗direction,batch,hidden)
, Represents the number of layers multiplied by the direction ( The single item is 1, Two way 2), batch , Dimensions of output
(2) The network will output output and h t h_t ht​
.output Represents the actual output of the network , Dimension is (seq,batch,hidden*direction), Represents the length of the sequence , Batch and output dimensions multiplied by direction ; h t h_t ht​
Represents a memory unit , dimension ( l a y e r s ∗ d i r e c t i o n , b a t c h , h i d d e n )
(layers*direction,batch,hidden)(layers∗direction,batch,hidden), Represents the number of layers multiplied by the direction , batch , Dimensions of output

<>3, Problems needing attention :

(1) Network output is ( s e q , b a t c h , h i d d e n ∗ d i r e c t i o n )
(seq,batch,hidden*direction)(seq,batch,hidden∗direction)
,direction=1 or 2. If it is a two-way network structure , It is equivalent to a network calculation from left to right , From right to left , There are two results .
Put the two results together according to the last dimension , It's the dimension of output .
(2) Hidden state
Network size of , Both input and output are (layer*direction,batch,hidden) Time , Because if the network has multiple layers , So each layer has a new memory unit , The bidirectional network structure has two different memory units in each layer , therefore
The first dimension is l a y e r ∗ d i r e c t i o n layer*direction layer∗direction

<>4,RNN code implementation
import torch from torch.autograd import Variable import torch.nn as nn rnn = nn
.RNN(input_size=20,hidden_size=50,num_layers=2) input_data = Variable(torch.
randn(100,32,20)) #seq,batch,feature # If it comes into the network , The hidden state is not specified , Then the default parameters of the output hidden state are all 0 h_0 =
Variable(torch.randn(2,32,50)) #layer*direction,batch,hidden_size output,h_t =
rnn(input_data,h_0) print(output.size()) #seq,batch,hidden_size print(h_t.size()
) #layer*direction,batch,hidden_size print(rnn.weight_ih_l0.size())
Print results :
torch.Size([100, 32, 50])
torch.Size([2, 32, 50])
torch.Size([50, 20]

<> two ,LSTM network

<>1,LSTM introduce

rnn = nn.LSTM(*arg,**kwargs)
LSTM In essence, it is not the same as the standard RNN same , The following mainly introduces the differences between the two :

(1)LSTM The parameter is RNN Four times as much
because LSTM Middle ratio standard RNN Three more linear transformations , The weights of three linear transformations are put together , So it's all in one 4 times , The same goes for bias 4 times .
(2) One more memory unit for input and output
LSTM The input is no longer only sequence input and hidden state , Hidden state except h 0 h_0 h0​ outside , One more C 0 C_0 C0​
Together, they become the hidden state of the network , And they're exactly the same size , First accident (layer*direction,batch,hidden), Of course, there will also be output h t h_t h
t​ and C t C_t Ct​

<>2,LSTM code implementation
# Define network lstm = nn.LSTM(input_size=20,hidden_size=50,num_layers=2) # Input variables
input_data= Variable(torch.randn(100,32,20)) # Initial hidden state h_0 = Variable(torch.randn(2
,32,50)) # Output memory cell c_0 = Variable(torch.randn(2,32,50)) # Output variables output,(h_t,c_t) =
lstm(input_data,(h_0,c_0)) print(output.size()) print(h_t.size()) print(c_t.size
()) # Parameter size is (50x4,20), yes RNN Four times as much print(lstm.weight_ih_l0) print(lstm.weight_ih_l0.
size())
Print results :
torch.Size([100, 32, 50])
torch.Size([2, 32, 50])
torch.Size([2, 32, 50])
tensor([[ 0.0068, -0.0925, -0.0343, …, -0.1059, 0.0045, -0.1335],
[-0.0509, 0.0135, 0.0100, …, 0.0282, -0.1232, 0.0330],
[-0.0425, 0.1392, 0.1140, …, -0.0740, -0.1214, 0.1087],
…,
[ 0.0217, -0.0032, 0.0815, …, -0.0605, 0.0636, 0.1197],
[ 0.0144, 0.1288, -0.0569, …, 0.1361, 0.0837, -0.0021],
[ 0.0355, 0.1045, 0.0339, …, 0.1412, 0.0371, 0.0649]],
requires_grad=True)
torch.Size([200, 20])

<> three ,GRU network

<>1,GRU introduce

rnn = nn.GRU(*arg,**kwargs)
GRU take LSTM The input gate and forgetting gate are combined . And LSTM There are the following 2 There are two differences :
(1)GRU The parameter is RNN Three times as much
(2) There is a hidden state h 0 h_0 h0​
It can be seen from the structure diagram ,GRU The hidden state is no longer h 0 h_0 h0​ and C 0 C_0 C0​, only one h 0 h_0 h0​, There is only one output h t h_t h
t​

<>2, code implementation
gru = nn.GRU(input_size=20,hidden_size=50,num_layers=2) # Input variables input_data =
Variable(torch.randn(100,32,20)) # Initial hidden state h_0 = Variable(torch.randn(2,32,50))
# Output variables output,(h_n,c_n) = gru(input_data) #lstm(input_data,h_0) Do not define initial hidden state, default to 0
print(output.size()) print(h_n.size()) print(gru.weight_ih_l0) print(gru.
weight_ih_l0.size())
Print results :
torch.Size([100, 32, 50])
torch.Size([32, 50])
Parameter containing:
tensor([[ 0.0878, 0.0383, -0.0261, …, 0.0801, -0.0932, -0.1267],
[ 0.0275, 0.1129, -0.0306, …, -0.0837, 0.0824, -0.1332],
[ 0.1061, -0.0786, -0.0163, …, -0.0622, -0.0350, -0.0417],
…,
[-0.0923, -0.0106, -0.0196, …, 0.0944, 0.0085, 0.0387],
[-0.0181, 0.0431, -0.1382, …, -0.1383, 0.0229, 0.1021],
[-0.0962, 0.0980, -0.0306, …, 0.0871, -0.0827, -0.0811]],
requires_grad=True)
torch.Size([150, 20])

PyTorch Also provided in RNNCell,LSTMCell,GRUCell, These three are the single step versions of the above three functions , In other words, their input is no longer a
sequence , It's a step in a sequence , It can also be said that it is a cycle of recurrent neural network , It is more flexible in the application of sequence , Because every step in the sequence is done manually , Can add more custom operations on the basis of .

Technology