Question
Deriving the Glorot initialization scheme The pre-2010 phase of deep learning research made extensive use of model pre-training, since training deep models was thought to
Deriving the Glorot initialization scheme The pre-2010 phase of deep learning research made extensive use of model pre-training, since training deep models was thought to be very hard. This changed in 2010 due in part to a paper by Xavier Glorot and Yoshua Bengio who showed that deep models can be trained by just ensuring good initializations. The key insight by Xavier was that a layer in a deep network should ensure that data passed through it maintains the same variance, since if it does not, deeper networks will have a multiplicative effect and change the variance even more. Derive the Glorot initialization scheme for a relu layer using this principle. Use the constraint on both forward and backward passes.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started