Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please help with this for deep learning. CODE IN PHYTON. Problem 3 : For a given number of parameters P , let m P (

Please help with this for deep learning. CODE IN PHYTON.
Problem 3: For a given number of parameters P, let mP(k) be the number of nodes per layer such that Params (k,m)=P(or as close to P as possible). A network with k layers and mP(k) nodes per layer should therefore have approximately P total parameters. For such a network, we can define trainLoss (k,P) and testLoss (k,P), for each k from 1 to kP.
Identify 10 values of P, sufficiently distinct so that the resulting network shapes and scale. How did you make your choice? Use the linear softmax model as a baseline.
Note, because of the integer/rounding issues, your P values should be distinct enough so that you don't accidentally create networks with the same m and k for two distinct P values.
Plot (overlaying the curves for each P) trainLoss (k,P), with the x-axis as kkP from 0 to 1.
Note, you probably do not want to test every possible k value, due to time constraints. But test enough k values so that the trend is clear.
Plot (overlaying the curves for each P) test Los(k,P), with the x-axis as kkP from 0 to 1.
How do the results compare to the baseline performance of the linear softmax model?
What do you notice about the underlying trends? Is there a point where layers become too narrow to be useful, and if so, where is it? What seems to be the sweet spot, if any, for network shape? How does it depend on P?
Create a plot showing (overlaying the curves for each P) total training time in terms of passes through the data. Set the x-axis to go over kkP for ease of comparison. Does this change your assessment of the network shape tradeoff at all?
Bonus:
Is total parameters P a fair comparison point? Try to find a better one. Justify it.
Does introducing regularization (weight decay) or normalization layers help?
Problem 4: For a P of your choice and the optimal network shape as determined above - try to find an even better network shape (layers of unequal size, for instance) that gives better results for the same (approximate) total number of parameters. Is it better to have uniform layers? Layers of decreasing size? Increasing size? Experiment with it, and summarize your results. You may want to save the best model you find.
Bonus: Does regularization help?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Privacy In Statistical Databases International Conference Psd 2022 Paris France September 21 23 2022 Proceedings Lncs 13463

Authors: Josep Domingo-Ferrer ,Maryline Laurent

1st Edition

3031139445, 978-3031139444

More Books

Students also viewed these Databases questions