During the history of deep learning, many researchers including some very well-known researchers, sometimes proposed optimization algorithms and showed that they worked well in a few problems. But those optimization algorithms subsequently were shown not to really generalize that well to the wide range of neural networks you might want to train. So over time, the deep learning community actually developed some amount of skepticism about new optimization algorithms. And a lot of people felt that gradient descent with momentum really works well, was difficult to propose things that work much better. So, RMSprop and the Adam optimization algorithm are one of those rare algorithms that has really stood up.
Initialize , then implement ‘momentum’ and ‘RMSprop’ . This is commonly used algorith that is effective in many neural networks.
On iteration
Compute on current mini-batch
$
small
large
Typically, bias correction is made.
Then update parameters:
Adam: ADAptive Moment estimation