Answered step by step
Verified Expert Solution
Question
1 Approved Answer
4. Assume that we are using the dataset in the below table as our labeled training dataset, and we want to make a prediction to
4. Assume that we are using the dataset in the below table as our labeled training dataset, and we want to make a prediction to tell us whether a query instance with SPEED =6.75 and AGILITY =3.00 is likely to be drafted or not. The SPEED and AGILITY ratings for 20 college athletes and whether they were drafted by a professional team. a) Plot the data vectors in the 2D plane (must use agility for vertical axis vs. speed for horizontal axis). b) Considering the 1-NN algorithm, draw the separating boundaries for classes. c) Considering the 1-NN algorithm, determine the class label for a query instance with SPEED = 6.75 and AGILITY =3.00 using the Euclidean distance metric. d) Considering the 1-NN algorithm, plot the query instance with a "Yes" class label and redraw the updated decision boundaries for classes if we assume the training data now has 21 data points. You may visually approximate the boundaries as exact computation is not necessary. e) The top right corner of the feature space contained a "No" region. This region exists because one of the No instances occurs far away from the rest of the instances with this target level. Considering that all the immediate neighbors of this instance are associated with the Yes target level, it is likely that either this instance has been incorrectly labeled and should have a target feature value of Yes, or one of the descriptive features for this instance has an incorrect value and hence it is in the wrong location in the feature space. Either way, this instance is likely to be an example of noise in the dataset. Determine a value of k such that this noise region becomes part of the Yes region. Redraw the decision boundaries or separating lines for the classes for this k value through visual approximation. Discuss if this k value is reasonable given the implications of determining a small vs. large k value for a given problem. f) Discuss the impact of using weighted k-NN algorithm on the separating or decision boundaries for k=21 (training data now includes the query instance as well) value by simply reasoning based on the weighted k-NN formula that employs inverse squared distances as the weighting factor. Consider the distance metric to be Euclidean. What is the class label for the top right triangle data point now
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started