RNN Recurrent Nerual Network

1. Structure

RNN uses the output as input of the next moment, so it can consider the timing information. The main part $S$ of the model is simple NN.

The weights $W, U, V$ are shared.

2. problems

Gradient Vanishing

Gradient vanishing in RNN shows that is cannot memorize long term information for it is covered by recent information. LSTM is a good improvement of this problem.

Non-parallel computing

RNN cannot be computed paralleled, for the next epoch needs the output of last epoch. Transformer is a good improvement of this problem.