RNN
RNN Recurrent Nerual Network
1. Structure
RNN uses the output as input of the next moment, so it can consider the timing information. The main part $S$ of the model is simple NN.
The weights $W, U, V$ are shared.
2. problems
- Gradient Vanishing
Gradient vanishing in RNN shows that is cannot memorize long term information for it is covered by recent information. LSTM is a good improvement of this problem.
- Non-parallel computing
RNN cannot be computed paralleled, for the next epoch needs the output of last epoch. Transformer is a good improvement of this problem.
All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.