Question

1 Approved Answer

Posted on Jun 23, 2024

Question 1 (50% ) The probability density function (pdf) for a 2-dimensional real-valued random vector X is as follows: p(x) = P(L = 0)p(x|L =

Question 1 (50% ) The probability density function (pdf) for a 2-dimensional real-valued random vector X is as follows: p(x) = P(L = 0)p(x|L = 0) + P(L = 1)p(x|L= 1). Here L is the true class label that indicates which class-label-conditioned pdf generates the data. The class priors are P(L = 0) = 0.65 and P(L = 1) = 0.35. The class class-conditional pdfs are p(x L=0) = wig(x|mol, Col ) + wag(x moz, Co2) and p(x|L= 1) = g(x|m , Cj), where g(x |m, C) is a multivariate Gaussian probability density function with mean vector m and covariance matrix C. The parameters of the class-conditional Gaussian pdfs are: wi = w2 = 1/2, and moi = [8] Col = [6 9] moz = [;] Coz = [8 9] mi = ] CI = [6 :] For numerical results requested below, generate 10000 samples according to this data distribu- tion, keep track of the true class labels for each sample. Save the data and use the same data set in all cases. Part A: ERM classification using the knowledge of true data pdf: 1. Specify the minimum expected risk classification rule in the form of a likelihood-ratio test: P(x|2=1)3 P(XL=0) > Y, where the threshold y is a function of class priors and fixed (nonnegative) loss values for each of the four cases D = iL = j where DE 0, 1 is the decision. 2. Implement this classifier and apply it on the 10K samples you generated. Vary the threshold y gradually from 0 to co, and for each value of the threshold estimate P(D = 1|1 = 1; y) and P(D = 1/L = 0; y). Using these paired values, plot an approximation of the ROC curve of the minimum expected risk classifier. Note that at y= 0 The ROC curve should be at (1 ), and as y increases it should traverse towards (8). Due to the finite number of samples used to estimate probabilities, your ROC curve approximation should reach this destination value for a finite threshold value. 3. Determine the theoretically optimal threshold value that achieves minimum probability of er- ror, and on the ROC curve, superimpose (using a different color/shape marker) the operating point of this min-P(error) decision rule by evaluating its two confusion matrix entries needed for its ROC curve point. Estimate the minimum probability of error that is achievable for this data distribution using the dataset and this theoretically optimal threshold. In addition, by evaluating estimated P(error; y) = P(D = 1 /L = 0; y)P(L = 0) + P(D = 0|L = 1; y)P(L = 1) values, determine empirically using the dataset a threshold value that minimizes this esti- mated P(error) value. How does your empirically determined y value that minimizes P(error) compare with the theoretically optimal threshold you compute from priors and loss values?Part B: Repeat the same steps as in the previous two cases {draw ROE curve 3:. nd threshold that minimizes P{error}}, but this time using a Fisher Linear Discriminant Analysis {LEA} based classier. Using the lK available samples and their labels, estimate the class conditional mean and covariance matrices using sample average estimators. From these estimates, determine the Fisher LDA projection weight vector {via the generalized eigendecomposition of within and between class scatter matrices}: Wm. For the classication rule \"has" compared to a tresl'sr-ld t, which 1 takes values from o= to on, plot the RDC curve. Identify the tl'ueshold at which the pmbability of error {based on sample count estimates} is minimized, and mark that operating point on the RDC curve estimate. Discuss how this LDA classier performs relative to the previous two classiers. IIZZompare the P{error} achieved by LDA with that of the optimal design. Note: When nding the Fisher Lisa projection matrix, do not be concerned about the diference in the class priors. When determining the betweenclass and withinclass scatter matrices, use equal weights for the class means and covariances, like we did in class. You could argue for a weighted approach but in this case the model mismatch is a more serious issue to be concerned about