Answered step by step
Verified Expert Solution
Question
1 Approved Answer
All answers to problem set questions must be typed so they can be reviewed by Turnitin. Please submit your completed problem set through Canvas
All answers to problem set questions must be typed so they can be reviewed by Turnitin. Please submit your completed problem set through Canvas using the instructions found on the syllabus. Problem 1 (2 points) Distance is a key notion underlying many data mining algorithms, such as k-nearest neighbor (k- NN). Why can it be a problem to compare customers using regular Euclidean distance such as when they are described by age (in years), income (in dollars), and number of credit cards? How can this problem be fixed? It's common for income values to have a much wider range compared to age and credit card numbers, causing income to have a stronger influence on predictions. To address this issue, we can normalize the data. This involves selecting a scale factor for each input attribute and adjusting the inputs along each dimension accordingly. The goal is to ensure that the variance is balanced on every axis. By applying this normalization technique, we can mitigate the dominance of income values and improve the accuracy and fairness of predictions. Problem 2 (3 points) You currently work for Aperture Science, a small company that sells information technology (IT) products. The lone data scientist at Aperture approaches you one day and proposes to use k-NN estimation to build a model to predict the IT budget of companies to identify potential new clients. They would like your help building and deploying the model. The only data you have on hand is a sample of companies across the United States, which includes their IT budget for last year, their total revenue last year, their total number of employees last year, and their industry classification. This data will make up your database of potential neighbors. Ultimately, as a first true test of the model you want predict the IT budget for Acme Corp., a potential client for whom you do not know their IT budget (but you know their total revenue, number of employees, and industry classification). A) Given the information above, explain how you could estimate Acme's IT budget using k- NN. B) If you chose X-N, the total number of training examples, what would be the effect?
Step by Step Solution
★★★★★
3.45 Rating (152 Votes )
There are 3 Steps involved in it
Step: 1
Answer Problem 1 Distance is a key notion underlying many data mining algorithms such as knearest neighbor kNN However it can be problematic to compar...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started