Question: Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

To find the largest numerical value for which the error rate of the classifier is greater than we need to calculate the error rate under two different ... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

A random sample of nine pairs of measurements is shown in the following table (saved in the LM14_40 file). a. Use the Wilcoxon signed rank test to determine wheth er the data provide sufficient...

Human Development/Life Span Group Presentation Each Group will choose a segment of the human life span that is particularly interesting to them: Adolescence (13 years through about 17 years). Based...

QUESTION 21 Which of the following is not a wrapper class? A. String B. Integer C. Character D. Double QUESTION 22 The conversion of an object of a wrapper class to a value of its associated...

You are a candidate for this role, make a cover letter. You have a year of all the necessary experience and a passion for customer service and social media. What you'll be doing Weekly Schedule:...

How would you build a JAVA Code with a class called Wallet using the specifications given in the form of a UML class diagram? The three constructors set the instance variables to the desired values:...

Implement the NAIVEBAYES algorithm Consider a binary classification problem, where the input x is a binary vector of length k . Naive Bayes is a generative model that assumes that features in x are...

Korea Advanced Institute of Science and Technology Department of Electrical Engineering & Computer Science EE531 Statistical Learning Theory, Spring 2016 Assignment I Issued: Mar. 19, 2016 Due: Apr....

could you help giving me explanations on this problem ? Thank you. For (i), just use the conclusion from exercise 2.7.2 Question : 2 MATH 156S HOMEWORK 5 Exercise 3. Let & GNB denote the Gaussian...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

A corporate board contains twelve members. The board decides to create a five-person Committee to Hide Corporation Debt. Suppose four members of the board are accountants. What is the probability...

Suppose you are solving the following equation: 2^(3x)=((1)/(16))^(x+1) You can solve this equation by re- writing both sides with the same base. If you re-write both sides with base 2, what is the...

Given: f ( x ) = 1 0 x 2 x 2 + 6 x + 1 0 2 ; find f ' ( x )

Assume that coal has a sulfur content of 3% by weight. If all the sulfur is converted into SO2 during the combustion process, how much SO2 is produced per tonne of coal? How much is produced per...

A European binary (or Digital) option pays $4 if the stock ends above $64 after 3 months and nothing otherwise. The following 3-period binomial tree represents the monthly stock price movements: S(0)...

Ma & Pa Incorporated manufactures toys for children and has partnerships with local retailers in the Midwestern states of Wisconsin, Iowa, Minnesota, and Illinois who sell their toys. The demand...

For this activity, you have been hired as a team of consultants on a multi-year basis for a global washer and dryer manufacturer. They currently offer two core washer and dryer sets: a high-end model...

. We continue the analysis begun in Exercises 1.7, 2.22, 3.6, and 4.7. The focus of this exercise is variable selection. a. Begin with the data from = 185 countries throughout the world that have...

There is little difference between the values of t/2 and z/2 when

Wild irises are beautiful flowers found throughout the United States, Canada, and northern Europe. This problem concerns the length of the sepal (leaf-like part covering the flower) of different...

The fan blades on commercial jet engines must be replaced when wear on these parts indicates too much variability to pass inspection. If a single fan blade broke during operation, it could severely...

Police are tested for their ability to correctly recognize and identify a suspect based on a witness or victims verbal description of the suspect. Scores on the identification test range from 0 to...

4. What are the contributions of the right hemisphere to emotional behaviors and interpreting other peoples emotions?

1. Much of the play behavior of a cat can be analyzed into attack and escape components. Is the same true for childrens play?

11. What brain mechanism enables the startle refl ex to be so fast?