Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y
Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3
Step by Step Solution
There are 3 Steps involved in it
Step: 1
To find the largest numerical value for which the error rate of the classifier is greater than we need to calculate the error rate under two different ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started