Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3

Step by Step Solution

There are 3 Steps involved in it

Step: 1

To find the largest numerical value for which the error rate of the classifier is greater than we need to calculate the error rate under two different ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Understandable Statistics Concepts And Methods

Authors: Charles Henry Brase, Corrinne Pellillo Brase

12th Edition

1337119911, 978-1337517508, 133751750X, 978-1337119917

More Books

Students also viewed these Programming questions

Question

11. What brain mechanism enables the startle refl ex to be so fast?

Answered: 1 week ago