upuluu uulu UiTU IJ IUUI TUI CA allaiyuub. CUSHI similarity is often a good choice when dealing with sparse non-binary data. What target level would a 3-NN model using cosine similarity return for the query? 4. (Exercise 3 of chapter 5) the predictive task in this question is to predict the level of corruption in a country based on a range of macro-economic and social features. The table below lists some countries described by the following descriptive features: (1) "Life Exp.", the mean life expectancy at birth; (2) "Top-10 Income", the percentage of the annual income of the country that goes to the top 10% of earners; (3) "Infant Mort.", the number of infant death per 1,000 births; (4) "Mil. Spend", the percentage of GDP spent on the military: (5) "School Years", the mean number years spent in school by adult females (note: consider using Excel formula to simplify your calculation, a spreadsheet of the data is uploaded to eCourse with this homework) The target feature is the Corruption Perception Index (CPI). The CPI measures the perceived the levels of corruption in the public sector of countries and ranges from 0 (highly corrupt) to 10 (very clean). Country Life Top-10 Infant Mil. School CPI ID exp. income Mort. Spend Years Afghanistan 59.61 23.21 74.30 4.44 0.40 1.5171 Haiti 45.00 47.67 73.10 0.09 3.40 1.7999 Nigeria 51.30 38.23 82.60 1.07 4.10 2.4493 Egypt 70.48 26.58 19.60 1.86 5.30 2.8622 Argentina 75.77 32.30 13.30 0.76 10.10 2.9961 China 74.87 29.98 13.70 1.95 6.40 3.6356 42.93 28.80 29.85 27.23 28.49 22.07 24.79 25.40 22.18 27.81 Brazil 73.12 81.30 78.51 80.15 80.09 80.24 82.09 80.99 81.43 New Zealand 80.67 1.43 6.77 4.72 0.60 2.59 1.31 14.50 3.60 6.30 3.50 4.40 3.50 4.90 4.20 2.40 4.90 7 .20 12.50 13.70 11.50 13.00 12.00 14.20 11.50 12.80 12.30 3.7741 5.8069 7.1357 7.5360 7.7751 8.0461 8.6725 8.8442 9.2985 9.4627 1.42 Israel U.S.A Ireland U.K. Germany Australia 1.86 Canada 1.27 Sweden 1.13 We will use Russia as our query country for this question. The table below lists the descriptive features for Russia. Country Life Top-10 Infant Mil. School CPI exp. income Mort. Spend Years Russia 67.62 31.68 10.00 3.87 1 2.90 ? (a) What value would a 3-nearest neighbor prediction model using Euclidean distance return for the CPI of Russia? (b) What value would a weighted k-NN prediction model return for the CPI of Russia? Use k = 16 (i.e., the full dataset) and a weighting scheme of reciprocal of the squared Euclidean distance between the neighbor and the query. (c) The descriptive feature in this dataset are of different types. For example, some are percentage, others are measured in years, and others are measured in counts per 1,000. We should always consider normalizing our data, but it is particularly important to do this when the descriptive features are measured in different units. What value would a 3-nearest neighbor prediction model using Euclidean distance return for the CPI of Russia when the descriptive features have been normalized using range normalization? (d) What value would a weighted k-NN prediction model with k = 16 (i.e., the full dataset) and using a weighing scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query, return for the CPI of Russia when it is applied to the range-normalized data? leThe actual 2011 CPI for Russia was 2.4488. Which of the predictions made was the most accurate? Why do you think this was? upuluu uulu UiTU IJ IUUI TUI CA allaiyuub. CUSHI similarity is often a good choice when dealing with sparse non-binary data. What target level would a 3-NN model using cosine similarity return for the query? 4. (Exercise 3 of chapter 5) the predictive task in this question is to predict the level of corruption in a country based on a range of macro-economic and social features. The table below lists some countries described by the following descriptive features: (1) "Life Exp.", the mean life expectancy at birth; (2) "Top-10 Income", the percentage of the annual income of the country that goes to the top 10% of earners; (3) "Infant Mort.", the number of infant death per 1,000 births; (4) "Mil. Spend", the percentage of GDP spent on the military: (5) "School Years", the mean number years spent in school by adult females (note: consider using Excel formula to simplify your calculation, a spreadsheet of the data is uploaded to eCourse with this homework) The target feature is the Corruption Perception Index (CPI). The CPI measures the perceived the levels of corruption in the public sector of countries and ranges from 0 (highly corrupt) to 10 (very clean). Country Life Top-10 Infant Mil. School CPI ID exp. income Mort. Spend Years Afghanistan 59.61 23.21 74.30 4.44 0.40 1.5171 Haiti 45.00 47.67 73.10 0.09 3.40 1.7999 Nigeria 51.30 38.23 82.60 1.07 4.10 2.4493 Egypt 70.48 26.58 19.60 1.86 5.30 2.8622 Argentina 75.77 32.30 13.30 0.76 10.10 2.9961 China 74.87 29.98 13.70 1.95 6.40 3.6356 42.93 28.80 29.85 27.23 28.49 22.07 24.79 25.40 22.18 27.81 Brazil 73.12 81.30 78.51 80.15 80.09 80.24 82.09 80.99 81.43 New Zealand 80.67 1.43 6.77 4.72 0.60 2.59 1.31 14.50 3.60 6.30 3.50 4.40 3.50 4.90 4.20 2.40 4.90 7 .20 12.50 13.70 11.50 13.00 12.00 14.20 11.50 12.80 12.30 3.7741 5.8069 7.1357 7.5360 7.7751 8.0461 8.6725 8.8442 9.2985 9.4627 1.42 Israel U.S.A Ireland U.K. Germany Australia 1.86 Canada 1.27 Sweden 1.13 We will use Russia as our query country for this question. The table below lists the descriptive features for Russia. Country Life Top-10 Infant Mil. School CPI exp. income Mort. Spend Years Russia 67.62 31.68 10.00 3.87 1 2.90 ? (a) What value would a 3-nearest neighbor prediction model using Euclidean distance return for the CPI of Russia? (b) What value would a weighted k-NN prediction model return for the CPI of Russia? Use k = 16 (i.e., the full dataset) and a weighting scheme of reciprocal of the squared Euclidean distance between the neighbor and the query. (c) The descriptive feature in this dataset are of different types. For example, some are percentage, others are measured in years, and others are measured in counts per 1,000. We should always consider normalizing our data, but it is particularly important to do this when the descriptive features are measured in different units. What value would a 3-nearest neighbor prediction model using Euclidean distance return for the CPI of Russia when the descriptive features have been normalized using range normalization? (d) What value would a weighted k-NN prediction model with k = 16 (i.e., the full dataset) and using a weighing scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query, return for the CPI of Russia when it is applied to the range-normalized data? leThe actual 2011 CPI for Russia was 2.4488. Which of the predictions made was the most accurate? Why do you think this was