Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

We have so far been concerned with predicting numerical quantities, like salaries. Now suppose we want to predict which dining option on campus a UCSD

image text in transcribed
image text in transcribed
image text in transcribed
We have so far been concerned with predicting numerical quantities, like salaries. Now suppose we want to predict which dining option on campus a UCSD student will select for lunch (e.g., Burger King, Tahini, FanFan, Dirty Birds, Taco Villa, etc.). Predicting a discrete category (as opposed to a number) is an important machine learning task called classification. We can use empirical risk minimization to make classifications, too. Suppose we have gathered a data set of n lunch preferences of students over the past week: Burger King Dirty Birds Burger King Tahini Taco Villa Burger King Dirty Birds Taco Villa The first step is to choose a number which will uniquely represent each option. For instance: Dirty Birds 1 Burger King 2 Tahini 3 Fan-Fan 4 Taco Villa 5 We then map each instance of a dining option to its corresponding number, giving us a new data set of numbers. For instance, the data above becomes: We then map each instance of a dining option to its corresponding number, giving us a new data set of numbers. For instance, the data above becomes: 21235215 a) 6. Now that we have converted the data to a list of numbers, we can make a prediction by minimizing the mean absolute loss. Explain why this is not a good idea. b) 66 Let's use the zero-one loss, which is defined as follows: L01(h,y)={0,1,h=yh=y As usual, define the risk to be the average loss: R01(h)=n1i=1nL01(h,y) Notice that R01(h) can be interpreted as the misclassification rate. That is, if R01(h)=.7, then predicting h would result in the wrong answer for 70% of the data points. Given the data set {4,2,4,1,3,4,4,3,2,5}, plot the empirical risk R01(h) for h[0,5]. Hint: the function should have point discontinuities. c) 6. Is gradient descent useful for minimizing the risk with zero-one loss? Why or why not? Make reference to your plot of the risk in your answer. Hint: the risk is indeed non-convex, but gradient descent can still be useful for minimizing non-convex functions. Is there some other reason

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

Define security and privacy. How are these two concepts related?

Answered: 1 week ago

Question

Define Administration?

Answered: 1 week ago

Question

Define Decision making

Answered: 1 week ago