Adam (page 340) was described as a combination of momentum and RMS-Prop. Using AIPython (aipython.org), Keras, or
Question:
Adam (page 340) was described as a combination of momentum and RMS-Prop. Using AIPython (aipython.org), Keras, or PyTorch (see Appendix B.2), find two datasets and compare the following:
(a) How does Adam with β1 = β2 = 0, differ from plain stochastic gradient descent without momentum? [Hint: How does setting β1 = β2 = 0 simplify Adam, considering first the case where g?] Which works better on the datasets selected?
(b) How does Adam with β2 = 0 differ from stochastic gradient descent, when the α momentum parameter is equal to β1 in Adam? [Hint: How does setting β2 = 0 simplify Adam, considering first the case where g?] Which works better on the datasets selected?
(c) How does Adam with β1 = 0 differ from RMS-Prop, where the ρ parameter in RMS-Prop is equal to β2 in Adam? Which works better on the datasets selected?
Step by Step Answer:
Artificial Intelligence: Foundations Of Computational Agents
ISBN: 9781009258197
3rd Edition
Authors: David L. Poole , Alan K. Mackworth