Question
2.11 The dataset ToyotaCorolla.csv contains data on used cars on sale during the late summer of 2004 in the Netherlands. It has 1436 records containing
2.11 The dataset ToyotaCorolla.csv contains data on used cars on sale during the late summer of 2004 in the Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications.
# used cars on sale in 2004 in Netherlands # read the dataset ToyotaCorolla.csv into a data frame called "orig.df"
orig.df = read.csv("datasets/ToyotaCorolla.csv")
# view the dataset
# Questions:
# OUR MAIN JOB IS TO CREATE A PREDICTION MODEL
# TO PREDICT PRICE OF A USED CAR WHEN A USED CAR IS GIVEN
# 1) The dataset has two categorical attributes, Fuel Type and Color.
# Using R's functions, transform these categorical data into dummies.
# how many additional columns will be added?
levels(orig.df$Fuel_Type) # 3 here
levels(orig.df$Color) # 10 here
# the following code will create a dataset including dummy variables
# for Fuel_Type and Color
# The new dataframe will be called "origDummies.df"
# 2) Prepare the dataset (i.e. the one with the dummies) by creating partitions in R.
# Select all the variables and use 1 for the random seed and
# partitioning percentages for training (50%), validation (50%).
# setting the seed to a fixed number
# how many total rows are there in the origDummies.df?
# randomly select row numbers for the training partition
# randomly select row numbers for the validation partition: sample from (all rows - training rows)
# Now create the train.data and valid.data dataframes
# 3) Propose three variables that could be used in a linear regression model
# 4) Create a linear regression model on the training dataset using variables
# Age, Kilometer and Manufacturer's Guarantee to predict "Price"
# use "reg" as the name of the model
# See the predicted values and actual values side by side
# plot the residuals
# see the model coefficients and their statistical significance
# now use the same model on validation data
# plot the residuals
# compute accuracy on training set
# compute accuracy on validation set
# use the model to make a prediction using new data
# What could be your price estimate for a car 30 months old, 22000 kilometers,
# no manufacturer guarantee?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started