Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A k - nearest neighbour algorithm is used to develop a classifier. The data set contains noise. Which of the following strategies will help to

A k-nearest neighbour algorithm is used to develop a classifier. The data set contains noise. Which of the
following strategies will help to reduce sensitivity to noise? Simply write down the letter(s) of the correct
approach(es). Note that incorrect answers will be penalized.
(a) Reduce the value of k
(b) Use a larger value for k
(c) Find all Tomek links and remove both instances that form the Tomek link
(d) Nothing can be done to reduce sensitivity to noise
(e) Use SMOTE to oversample the minority class
Consider a regression problem, where the data set has the following characteristics:
There are 185 instances
There is one categorical and five numeric descriptive features
The categorical feature has three possible values. One of these values occurs for 70% of the instances,
and the other two values occur respectively for 13% and 17% instances
Four of the numeric features have values in the range 0,1, and the fifth numeric feature has values
in the range 100,100000
For 1% of the instances there are outliers for the target feature
One of the numeric descriptive features has a few outliers
One of the numeric descriptive features has 3% missing values
A k-nearest neighbour algorithm is used and Euclidean distance is used as the similarity measure. Which
of the following statements are correct? Simply write down the letter(s) of the correct statement(s). Be
careful; marks will be subtracted for incorrect answers.
(a) The value for k has to be large
(b) One-hot encoding has to be applied to the categorical feature
(c) The numerical features have to be scaled to the same range, e.g.0,10
(d) The outliers in the numeric descriptive feature have to be removed
(e) The missing values have to be imputed
(f) Under-sampling or over-sampling has to be applied to the categorical feature to balance the distri-
bution of possible values
(g) The predicted value is calculated using the average over the target values of the neighbors
What is the inductive bias of the k-nearest neighbour algorithm?
Is it necessary to normalize input features when a k-nearest neighbour algorithm is used? Motivate your
answer.
Explain how k-nearest neighbours can be used to impute missing values.
Can the k-nearest neighbour algorithm be applied to problems with categorical descriptive features?
Motivate your answer.
Discuss the consequences of different values for k when k-nearest neigbours is applied to regression prob-
lerns.
AML874: Describe how k-nearest neighbours can be used to classify images.
AML874: Describe how k-nearest neighbours can be used to correct misspelt words using an example. How
would you decide how to modify the original word if k is equal to three?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2014 Nancy France September 15 19 2014 Proceedings Part 2 Lnai 8725

Authors: Toon Calders ,Floriana Esposito ,Eyke Hullermeier ,Rosa Meo

2014th Edition

3662448505, 978-3662448502

More Books

Students also viewed these Databases questions