Answered step by step
Verified Expert Solution
Question
1 Approved Answer
# numpy and pandas import numpy as np import pandas as pd import math #graphics with matplotlib import matplotlib.pyplot as plt plt.style.use('seaborn') %matplotlib inline #
# numpy and pandas import numpy as np import pandas as pd import math #graphics with matplotlib import matplotlib.pyplot as plt plt.style.use('seaborn') %matplotlib inline # model, train/test split, dummies (one-hot-encoding), rmse metric from scikit learn. from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelBinarizer from sklearn.metrics import mean_squared_error
Get the data
Lets read in the used cars data. We will just use the features mileage and color.
cd = pd.read_csv("https://bitbucket.org/remcc/rob-data-sets/downloads/susedcars.csv") cd = cd[['price','mileage','color']] cd['price'] = cd['price']/1000 cd['mileage'] = cd['mileage']/1000 cd.head()
We are fitting the model:
price=0+1mileage+2mileage^2+
What do you think ?
Homework:
Use out of sample performance (a train/test split) to decide which of these two models is best:
linear model of log(y) on mileage and color
linear model of y on mileage, mileage squared, and color
Use out of sample rmse and graphics to compare the two models.
How would I code this in python to check the two models?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started