Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

# numpy and pandas import numpy as np import pandas as pd import math #graphics with matplotlib import matplotlib.pyplot as plt plt.style.use('seaborn') %matplotlib inline #

# numpy and pandas import numpy as np import pandas as pd import math #graphics with matplotlib import matplotlib.pyplot as plt plt.style.use('seaborn') %matplotlib inline # model, train/test split, dummies (one-hot-encoding), rmse metric from scikit learn. from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelBinarizer from sklearn.metrics import mean_squared_error 

Get the data

Lets read in the used cars data. We will just use the features mileage and color.

cd = pd.read_csv("https://bitbucket.org/remcc/rob-data-sets/downloads/susedcars.csv") cd = cd[['price','mileage','color']] cd['price'] = cd['price']/1000 cd['mileage'] = cd['mileage']/1000 cd.head() 

We are fitting the model:

price=0+1mileage+2mileage^2+

What do you think ?

Homework:

Use out of sample performance (a train/test split) to decide which of these two models is best:

linear model of log(y) on mileage and color

linear model of y on mileage, mileage squared, and color

Use out of sample rmse and graphics to compare the two models.

How would I code this in python to check the two models?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions