RNN Recurrent Nerual Network

1. Structure

20180528103427942.png

​ RNN uses the output as input of the next moment, so it can consider the timing information. The main part $S$ of the model is simple NN.

​ The weights $W, U, V$ are shared.

2. problems

  1. Gradient Vanishing

​ Gradient vanishing in RNN shows that is cannot memorize long term information for it is covered by recent information. LSTM is a good improvement of this problem.

  1. Non-parallel computing

​ RNN cannot be computed paralleled, for the next epoch needs the output of last epoch. Transformer is a good improvement of this problem.