Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful

image text in transcribedimage text in transcribedimage text in transcribed

Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful predictions. Work through the steps below and then answer the following questions. Even though a lot of the code is pre-written, you should understand what it is doing! You may be asked to write some of this code on future assignments. First we split the data into a training set and a validation set. You should not modify the next two cells. Even though there is an edTest comment. The edTest comment is there to let us set up some state, and does no test any functionality. These cells need to be left as-is, otherwise it will potentially mess up future tests. from sklearn.model_selection import train_test_split import numpy as np \# Set seed to create pseudo-randomness np random. seed(416) \# split sales data into 75% train and 25% test train_data, val_data = train_test_split( put proper variables here ) train_data, val_data = train_test_split( ) Syntaxerror: invalid syntax Lets plot some of the data to get a sense of what we are dealing with. You do not need to understand every part of the plotting code here, but plotting is a good skill in Python so it will help to read over this. import matplotlib.pyplot as plt \%matplotlib inline \# Plot sqft_living vs housing price for the train and val dataset plt.scatter(train_data['sqft_living'], train_data['price'], marker='+' , label=' Train') plt.scatter (val_data['sqft_lving'], val_data['price'], marker='.', label='Validation') * Code to customize the axis labels plt. legend() plt.xlabel("Sqft Living") plt.ylabel("Price") For this problem, we will look at using two sets of features derived from the data inputs. The basic set of features only contains a few data inputs while the advanced features contain them and more. basic_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode'] In the following cell, you should train two linear regression models - The first should be saved in a variable called basic_model that only uses the basic features - The seconod should be saved in a variable called advanced_model that uses the advanced features You'll need to look through the LinearRegression class from scikit-learn to look into how to train a regression model for this task. In particular, make sure you check out the fit function. Notice that our goal is to eventually make a prediction of how the model will do in the future. You should keep this in mind when deciding which datasets to use where. from sklearn.linear_model import LinearRegression \# TODO Q4 build basic model on basic features (above) and advance model on advanced features. basic_model = LinearRegression().fit() advanced_model = LinearRegression().fit() Now, we will evaluate the models' predictions to see how they perform. Root Mean Square Error (RMSE) of trained predictors Q5) What are your Root Mean Squared Errors (RMSE) on your training data using the basic model and the advanced model? Use the models you trained in last section to predict what it thinks the values for the data points should be. You can look at the documentation from the model to see how to make predictions. The RMSE is another commonly reported metric used for regression models. The RMSE is similar to MSE but is modified slightly to scale the number down. The RMSE is defined as RMSE=MSE where the thing inside the square root is refered to as the Mean Square Error (MSE). There are two ways you can calculate this: 1. Use the mean_squared_error function from sklearn (documentation here) 2. Use numpy 's element-wise operations (such as - for exponent) and for calcuating the average Note: It's more straightforward to use sklearn 's predefined functions, but it's more helpful to use ones for more low-level implementation, which will be more helpful in the future. Save your result in variables named train_rmse_basic and train_rmse_advanced respectively. Remember, we want you to report the square root of the MSE numbers. Q6) What are your RMSE errors on your validation data using the basic model and then the advanced model? Similar to the last problem, but compute the validation (val) RMSE. Store your results in and val_rmse_advanced . Please pay attention to the format function used for output of the result. \# TODO Q6, add needed code below val_rmse_basic = val_rmse_advanced = print("Validation RMSE, basic {b}, advanced = {a} ".format(b=train_rmse_basic, a=train_rmse_advanced)) Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful predictions. Work through the steps below and then answer the following questions. Even though a lot of the code is pre-written, you should understand what it is doing! You may be asked to write some of this code on future assignments. First we split the data into a training set and a validation set. You should not modify the next two cells. Even though there is an edTest comment. The edTest comment is there to let us set up some state, and does no test any functionality. These cells need to be left as-is, otherwise it will potentially mess up future tests. from sklearn.model_selection import train_test_split import numpy as np \# Set seed to create pseudo-randomness np random. seed(416) \# split sales data into 75% train and 25% test train_data, val_data = train_test_split( put proper variables here ) train_data, val_data = train_test_split( ) Syntaxerror: invalid syntax Lets plot some of the data to get a sense of what we are dealing with. You do not need to understand every part of the plotting code here, but plotting is a good skill in Python so it will help to read over this. import matplotlib.pyplot as plt \%matplotlib inline \# Plot sqft_living vs housing price for the train and val dataset plt.scatter(train_data['sqft_living'], train_data['price'], marker='+' , label=' Train') plt.scatter (val_data['sqft_lving'], val_data['price'], marker='.', label='Validation') * Code to customize the axis labels plt. legend() plt.xlabel("Sqft Living") plt.ylabel("Price") For this problem, we will look at using two sets of features derived from the data inputs. The basic set of features only contains a few data inputs while the advanced features contain them and more. basic_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode'] In the following cell, you should train two linear regression models - The first should be saved in a variable called basic_model that only uses the basic features - The seconod should be saved in a variable called advanced_model that uses the advanced features You'll need to look through the LinearRegression class from scikit-learn to look into how to train a regression model for this task. In particular, make sure you check out the fit function. Notice that our goal is to eventually make a prediction of how the model will do in the future. You should keep this in mind when deciding which datasets to use where. from sklearn.linear_model import LinearRegression \# TODO Q4 build basic model on basic features (above) and advance model on advanced features. basic_model = LinearRegression().fit() advanced_model = LinearRegression().fit() Now, we will evaluate the models' predictions to see how they perform. Root Mean Square Error (RMSE) of trained predictors Q5) What are your Root Mean Squared Errors (RMSE) on your training data using the basic model and the advanced model? Use the models you trained in last section to predict what it thinks the values for the data points should be. You can look at the documentation from the model to see how to make predictions. The RMSE is another commonly reported metric used for regression models. The RMSE is similar to MSE but is modified slightly to scale the number down. The RMSE is defined as RMSE=MSE where the thing inside the square root is refered to as the Mean Square Error (MSE). There are two ways you can calculate this: 1. Use the mean_squared_error function from sklearn (documentation here) 2. Use numpy 's element-wise operations (such as - for exponent) and for calcuating the average Note: It's more straightforward to use sklearn 's predefined functions, but it's more helpful to use ones for more low-level implementation, which will be more helpful in the future. Save your result in variables named train_rmse_basic and train_rmse_advanced respectively. Remember, we want you to report the square root of the MSE numbers. Q6) What are your RMSE errors on your validation data using the basic model and then the advanced model? Similar to the last problem, but compute the validation (val) RMSE. Store your results in and val_rmse_advanced . Please pay attention to the format function used for output of the result. \# TODO Q6, add needed code below val_rmse_basic = val_rmse_advanced = print("Validation RMSE, basic {b}, advanced = {a} ".format(b=train_rmse_basic, a=train_rmse_advanced))

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

1118102282, 9781118102282

More Books

Students also viewed these Databases questions