Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The following dataset is given, and you are asked to classify data point (15,C) using variants of k-NN classifier. The dataset contains continuous and categorical
The following dataset is given, and you are asked to classify data point (15,C) using variants of k-NN classifier. The dataset contains continuous and categorical attributes and a binary class label. dij(xi,xj)=w1(x1ix1j)2+w21[x2i=x2j] where w1 and w2 are the mutual information between class label Y and attribute (normalized) X1 and X2, respectively. Note that we have previously calculated mutual information for two categorical random variables, but here we have to calculate the mutual information between a continuous and a binary variable. The best practice to do that is to approximate the continuous variable with a categorical variable. We can assume the categorical approximation of X1, which we call X1c, has three states: X1C=012X11 First, rewrite the data table with X1C, and then find w1=MI(X1C,Y) and w2=MI(X2,Y). Hint: You need to construct the probability tables for P(X1C,Y) and P(X2,Y),P(X1C),P(X2), and P(Y). MI(X,Y)=(x,y)P(X=x,Y=y)logP(X=x)P(Y=y)P(X=x,Y=y). - [5 points] Classify (15,C) using 1-NN and 3-NN with the mutual information weighted distance in eq. 2. Note: when you calculate the distance you need to normalize X1 of the unseen point (using the mean and standard deviation that you find in Part 1) and use it to calculate the distance, not the categorized version. The categorized version is only used to calculate w1
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started