Question
Group Project on unsupervised learning using R: Cars93 Instructions Use the Cars93 data set from the built-in package MASS available in R. You are to
Group Project on unsupervised learning using R: Cars93
Instructions
Use the Cars93 data set from the built-in package "MASS" available in R.
You are to analyse the data using unsupervised learning. Marks will be awarded by:
1.Preparing the data(20 marks)
2.Principal Components Analysis.(25 marks)
3.K-means and Hierarchical Clustering(25 marks)
4.Decision Trees.(25 marks)
The remaining 5 marks will be allocated to presentation, following instructions, etc.
You are to submit (softcopy only):
A .pdf file of a report on the data preparation, result outputs and interpretations, and possible implications/conclusions from your data analysis.
oUse Microsoft Word to do the report, then print as a .pdf file for submission.
oInclude discussions, diagrams, tables, etc. that are relevant to your analysis.
oThere is no word limit, but you should not be too superfluous in your report. Everything discussed and presented must be relevant.
A copy of the R code used in doing your analysis for the project (e.g. a .txt file).
oInclude any relevant title lines (e.g. project name, date, etc.) as #comments as well.
oWhen I implement your code line-by-line, I expect to have smooth execution. Any errors will incur penalties commensurate with their nature.
****Any packages and functions referred to below are just suggestions. You can use different methods/packages/functions to achieve your result.****
Preparing the data (20%)
Move "Model" to column1. One way is to use the dplyr library from the tidyverse package.
Let the model names be your rownames (not to be used as data, but will be the name of each row). See James textbook, page 55 for some guidance.
Ensure that the data is complete before doing the analysis. Look for "NA" entries, especially in the "Rear.seat.room" and "Luggage.room" columns. You may choose to replace "NA" with 0.
Look at the spelling of "Chrysler" as you go down the column. Fix the error.
"Cylinders" has a non-numeric term. Make an appropriate decision.
If necessary, replace qualitative data with quantitative data.
For binary data, e.g. Origin: USA/non-USA can be coded 1 or 0. Do you need two columns, or one? You can use the dummy( ) function from the dummies package.
For nominal or ordinal data, egDriveTrain, you can use 3 sets of binary data columns: Front : 1 or 0, Rear: 1 or 0, and 4WD: 1 or 0. You can use the dummy_cols( ) function in the fastDummies package.
Remove the original columns of categorical/binary data if dummies have been substituted.
You can remove columns of data that you think are unnecessary, but you must justify that decision.
Principal Components Analysis (25%)
Perform PCA on your Cars93 Data.
Display and report on your findings (this is totally up to you).
Use your graphics, and explain how you choose 3 possible cars to suggest to each of the following customers:
(a) A student wants a cheap, fuel-efficient car, and is not concerned of its origin.
(b) A mom with four young children can only drive automatic and loves US cars.
(c) A middle-aged executive wants a sporty, non-US vehicle. For him, price is not a factor.
(d) A consulate wants a luxury midsize sedan for its personnel. They are not willing to purchase US-made vehicles because of a political situation.
(e) A family of 6-footers who want a midsize car, not a van.
Further instructions to follow on:
Clustering
Decision Trees
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started