3. The predictive task in this question is to predict the level of corruption in a country...
Question:
3. The predictive task in this question is to predict the level of corruption in a country based on a range of macro-economic and social features.
The table below lists some countries described by the following descriptive features:
LIFE EXP., the mean life expectancy at birth TOP-10 INCOME, the percentage of the annual income of the country that goes to the top 10% of earners INFANT MORT., the number of infant deaths per 1,000 births MIL. SPEND, the percentage of GDP spent on the military SCHOOL YEARS, the mean number years spent in school by adult females The target feature is the Corruption Perception Index (CPI). The CPI measures the perceived levels of corruption in the public sector of countries and ranges from 0 (highly corrupt) to 100 (very clean)
We will use Russia as our query country for this question. The table below lists the descriptive features for Russia.
a. What value would a 3-nearest neighbor prediction model using Euclidean distance return for the CPI of Russia?
b. What value would a weighted k-NN prediction model return for the CPI of Russia? Use k = 16 (i.e., the full dataset) and a weighting scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query.
c. The descriptive features in this dataset are of different types. For example, some are percentages, others are measured in years, and others are measured in counts per 1,000. We should always consider normalizing our data, but it is particularly important to do this when the descriptive features are measured in different units. What value would a 3-nearest neighbor prediction model using Euclidean distance return for the CPI of Russia when the descriptive features have been normalized using range normalization?
d. What value would a weighted k-NN prediction model—with k = 16 (i.e., the full dataset) and using a weighting scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query —return for the CPI of Russia when it is applied to the rangenormalized data?
e. The actual 2011 CPI for Russia was 2.4488. Which of the predictions made was the most accurate? Why do you think this was?
Step by Step Answer:
Fundamentals Of Machine Learning For Predictive Data Analytics Algorithms Worked Examples And Case Studies
ISBN: 9780262029445
1st Edition
Authors: John D. Kelleher, Brian Mac Namee, Aoife D'Arcy