Some theory Well, actually, there are plenty of useful resources like this or this, that explained in details how GRU/LSTM architectures work, all the math behind them and so on, but I think that all explanations I’ve seen before are somewhat misleading. Many articles show pictures like these: