Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Hi, I am looking for guidance on questions (a) and (b) please. Could you please help with these calculations? 6. [21 points] Batch Normalization Machine

Hi,

I am looking for guidance on questions (a) and (b) please. Could you please help with these calculations?

image text in transcribedimage text in transcribed
6. [21 points] Batch Normalization Machine Learning Algorithms tend to give better results when input features are normalized and uncorrelated. For this reason, data normalization is a common pre-processing step in deep learn- ing. However, the distribution of each layer's inputs may also vary over time and across layers, sometimes making training deep networks more difficult. Batch Normalization is a method that draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Given a batch B = (x(1), ..., x(m) ) of vectors in Ra, we first normalize inputs into (x(), ..., i(m) ) and then linearly transform them into (y(1), ..., y(m)). A Batch Normalization layer also has parameters y E Rd and B E Rd. The layer works as follows: . . . . . Compute . . Y, B ju (B), v(B) Normalize . . . . . Linear transformation . X y Figure 3: Batch Normalization First, compute the batch mean vector M (1B ) _ _ 1 m x) ERd (7) j E {1, ..., d} by: Then, compute the batch variance vector v() E Rd, which is defined element-wise for any (8) Then, normalize the vectors in B by computing vectors (x (1), ..., (m) ) such that &() E Rd is defined element-wise for any j E {1, ..., d} by: (13) v(1) (9) Finally, output vectors (y(1), ..., y(m) ) in Red defined by y (2) = 105) +B (10) where O is the element-wise vector multiplication.In this question, you will derive the back-propagation rules for batch-normalization. Let E. = (y(1),. . . ,y(m)) be some scalar function of the batch-normalization output vectors. We will calculate the gradients of f, with respect to the parameters and some intermediate variables. (a) [6 points] Calculate the gradient of f, w. r. t. 6, qr, and 51:0 for 12 E {1,.. .,m}; i.e. calculate %, 33, and 6?: ,,, for 'i 6 {1,.. .,m} Your answer can depend on aa, and the parameters and variables in the forward pass (such as arm's). Your answer may be vectorized or un vectorized (e.g. % or 65,- for j E {1,.. .d}). (b) [3 points] Calculate the gradient of w.r.t. 11(5); i.e. calculate 3365. Your answer can depend on % and the parameters and variables in the forward pass, as well as the gradients that you calculated in part (a). Your answer may be vectorized or un-vectorized 313 at, (B: or Tami-J1; forj E {1,.. .,.d}) (c) [6 points] Show that the gradient of E w.r.t. 11(5) is 6K. 6,1118) ZWG @281: 633(1) 1 (B) \"1' Please show all of your work Note that W is the vector where the jth element is (i.e. element-wise square root and reciprocal). Hint: Consider applying the chain rule using 111) for i E {1,.. .,m} as the intermediate (8 ) directly through p.03 ) variables, and note that cf:(. ) depends on it, 3 as) , . and indirectly through (d) [6 points] Show that the gradient of E. w.r.t. r\") for i E {1, ..., m} is 613 _ 1 Q as +2(m'),u(5))6 +1 613 3370) _ ,/,,(5) arm m 611(5) +map) l as) Please show all of your work Note that W is the vector where the jth element is (i.e. element-wise square root and reciprocal). Hint: Consider applying the chain rule using if,\" for k E {1, ...,m} as the intermediate variables, and note that it?) for k E {1, ...,m} depends on as?) directly through if?) for k i and indirectly through p.03) and 113(8) for k E {1, ..., m}. Remarks on the broader context: After obtaining a: To (as a function of a: [,, and other quantities known in the forward pass), one can propagate the gradient backwards to other layers that generated the arm's (you are not asked to show this). Empirically, it turns out to be important to consider \"(5} and 11(3) as variables (instead of constants), so that in the chain rule, we consider the gradient through #(5) and 11

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Calculus Early Transcendentals

Authors: James Stewart

8th edition

1285741552, 9781305482463 , 978-1285741550

More Books

Students also viewed these Mathematics questions

Question

Design a test plan for the requirements created in Exercise E-7.

Answered: 1 week ago