Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

( d ) Suppose max pooling is applied on an 8 8 image with a 2 2 filter and stride 2 pixels. What will be

(d) Suppose max pooling is applied on an 88 image with a 22 filter
and stride 2 pixels. What will be the number of parameters in this
layer?
(1 mark)
(e) Consider the following plot of the number of stochastic gradient
descent (SGD) iterations required to reach a given loss, as a function
of the batch size:
For small batch sizes, the number of iterations required to reach the target
loss decreases as the batch size increases. Why is that?
(2 marks)
(f) Write down the number of parameters in each field. Assume the
convolution filter is of shape 3364, what would be the values in
the fields II, III, and V?
(3 marks)(g) You are given a black box optimizer which produces the loss curve
shown in Figure A. You see a big red button on the optimizer and
decide to push it. After doing this, you notice the loss curve shown in
Figure B. You press the button one more time and finally notice the
loss curve shown in Figure C.
1gure L
The red button modifies a single hyperparameter. Which hyperparameter is
most likely to be modified by pressing the button?
(1 mark)
Also, of experiments 1,2 and 3, which corresponds to largest magnitude of
the hyperparameter?
(1 mark)
Lastly, the loss curve for experiment 3 seems to be the most desirable.
Despite this, give two reasons why you would choose the hyperparameter
in experiment 2 for training your model.
(2 marks)Neural networks.
(a) Let us say you have a training set s containing m pairs (,yi) where
vector x is to be assigned to one of K classes in a supervised setting
and the labels yi are the vectors in {0,1}K containing a single 1
representing the target class, i.e., if there are 5 classes and some
should be assigned to class 2 then yi=(0,1,0,0,0). To do this, it is
proposed that you use K neural networks. The ith network has
parameters wi and computes the function wi,x. You may make no
further assumptions regarding the function h.
You aim to treat the output of the i th network as an estimate of the
probability class i|x,w that x should be in the i th class,
where w collects together all the K vectors wl,dots,wK. It is
proposed that to do this you should modify the setup described to
compute
P(n class i|x,w)=prob(i,x)
=exph(wi,x)j=1Kexp(h(wi,x))
Explain why this modification is required, and how it achieves the
stated aim?
(4 marks)
(b) Suppose a convolution layer takes a 32323 input volume, and
applies ten 55 filters with stride 1 pixel and padding 2 pixels. What
will be the size of the output volume?
(2 marks)
(c) Given the graphs of testing and training error, do you think an
evident problem here is overfitting? Yes or no, please justify your
answer!
(4 marks)
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke, David Auer, Scott Vandenberg, Robert Yoder

8th Edition

013460153X, 978-0134601533

Students also viewed these Databases questions

Question

Be able to explain the concept of constructive discharge

Answered: 1 week ago