Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 02, 2024

Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful

Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful predictions. Work through the steps below and then answer the following questions. Even though a lot of the code is pre-written, you should understand what it is doing! You may be asked to write some of this code on future assignments. First we split the data into a training set and a validation set. You should not modify the next two cells. Even though there is an edTest comment. The edTest comment is there to let us set up some state, and does no test any functionality. These cells need to be left as-is, otherwise it will potentially mess up future tests. from sklearn.model_selection import train_test_split import numpy as np \# Set seed to create pseudo-randomness np random. seed(416) \# split sales data into 75% train and 25% test train_data, val_data = train_test_split( put proper variables here ) train_data, val_data = train_test_split( ) Syntaxerror: invalid syntax Lets plot some of the data to get a sense of what we are dealing with. You do not need to understand every part of the plotting code here, but plotting is a good skill in Python so it will help to read over this. import matplotlib.pyplot as plt \%matplotlib inline \# Plot sqft_living vs housing price for the train and val dataset plt.scatter(train_data['sqft_living'], train_data['price'], marker='+' , label=' Train') plt.scatter (val_data['sqft_lving'], val_data['price'], marker='.', label='Validation') * Code to customize the axis labels plt. legend() plt.xlabel("Sqft Living") plt.ylabel("Price") For this problem, we will look at using two sets of features derived from the data inputs. The basic set of features only contains a few data inputs while the advanced features contain them and more. basic_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode'] In the following cell, you should train two linear regression models - The first should be saved in a variable called basic_model that only uses the basic features - The seconod should be saved in a variable called advanced_model that uses the advanced features You'll need to look through the LinearRegression class from scikit-learn to look into how to train a regression model for this task. In particular, make sure you check out the fit function. Notice that our goal is to eventually make a prediction of how the model will do in the future. You should keep this in mind when deciding which datasets to use where. from sklearn.linear_model import LinearRegression \# TODO Q4 build basic model on basic features (above) and advance model on advanced features. basic_model = LinearRegression().fit() advanced_model = LinearRegression().fit() Now, we will evaluate the models' predictions to see how they perform. Root Mean Square Error (RMSE) of trained predictors Q5) What are your Root Mean Squared Errors (RMSE) on your training data using the basic model and the advanced model? Use the models you trained in last section to predict what it thinks the values for the data points should be. You can look at the documentation from the model to see how to make predictions. The RMSE is another commonly reported metric used for regression models. The RMSE is similar to MSE but is modified slightly to scale the number down. The RMSE is defined as RMSE=MSE where the thing inside the square root is refered to as the Mean Square Error (MSE). There are two ways you can calculate this: 1. Use the mean_squared_error function from sklearn (documentation here) 2. Use numpy 's element-wise operations (such as - for exponent) and for calcuating the average Note: It's more straightforward to use sklearn 's predefined functions, but it's more helpful to use ones for more low-level implementation, which will be more helpful in the future. Save your result in variables named train_rmse_basic and train_rmse_advanced respectively. Remember, we want you to report the square root of the MSE numbers. Q6) What are your RMSE errors on your validation data using the basic model and then the advanced model? Similar to the last problem, but compute the validation (val) RMSE. Store your results in and val_rmse_advanced . Please pay attention to the format function used for output of the result. \# TODO Q6, add needed code below val_rmse_basic = val_rmse_advanced = print("Validation RMSE, basic {b}, advanced = {a} ".format(b=train_rmse_basic, a=train_rmse_advanced)) Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful predictions. Work through the steps below and then answer the following questions. Even though a lot of the code is pre-written, you should understand what it is doing! You may be asked to write some of this code on future assignments. First we split the data into a training set and a validation set. You should not modify the next two cells. Even though there is an edTest comment. The edTest comment is there to let us set up some state, and does no test any functionality. These cells need to be left as-is, otherwise it will potentially mess up future tests. from sklearn.model_selection import train_test_split import numpy as np \# Set seed to create pseudo-randomness np random. seed(416) \# split sales data into 75% train and 25% test train_data, val_data = train_test_split( put proper variables here ) train_data, val_data = train_test_split( ) Syntaxerror: invalid syntax Lets plot some of the data to get a sense of what we are dealing with. You do not need to understand every part of the plotting code here, but plotting is a good skill in Python so it will help to read over this. import matplotlib.pyplot as plt \%matplotlib inline \# Plot sqft_living vs housing price for the train and val dataset plt.scatter(train_data['sqft_living'], train_data['price'], marker='+' , label=' Train') plt.scatter (val_data['sqft_lving'], val_data['price'], marker='.', label='Validation') * Code to customize the axis labels plt. legend() plt.xlabel("Sqft Living") plt.ylabel("Price") For this problem, we will look at using two sets of features derived from the data inputs. The basic set of features only contains a few data inputs while the advanced features contain them and more. basic_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode'] In the following cell, you should train two linear regression models - The first should be saved in a variable called basic_model that only uses the basic features - The seconod should be saved in a variable called advanced_model that uses the advanced features You'll need to look through the LinearRegression class from scikit-learn to look into how to train a regression model for this task. In particular, make sure you check out the fit function. Notice that our goal is to eventually make a prediction of how the model will do in the future. You should keep this in mind when deciding which datasets to use where. from sklearn.linear_model import LinearRegression \# TODO Q4 build basic model on basic features (above) and advance model on advanced features. basic_model = LinearRegression().fit() advanced_model = LinearRegression().fit() Now, we will evaluate the models' predictions to see how they perform. Root Mean Square Error (RMSE) of trained predictors Q5) What are your Root Mean Squared Errors (RMSE) on your training data using the basic model and the advanced model? Use the models you trained in last section to predict what it thinks the values for the data points should be. You can look at the documentation from the model to see how to make predictions. The RMSE is another commonly reported metric used for regression models. The RMSE is similar to MSE but is modified slightly to scale the number down. The RMSE is defined as RMSE=MSE where the thing inside the square root is refered to as the Mean Square Error (MSE). There are two ways you can calculate this: 1. Use the mean_squared_error function from sklearn (documentation here) 2. Use numpy 's element-wise operations (such as - for exponent) and for calcuating the average Note: It's more straightforward to use sklearn 's predefined functions, but it's more helpful to use ones for more low-level implementation, which will be more helpful in the future. Save your result in variables named train_rmse_basic and train_rmse_advanced respectively. Remember, we want you to report the square root of the MSE numbers. Q6) What are your RMSE errors on your validation data using the basic model and then the advanced model? Similar to the last problem, but compute the validation (val) RMSE. Store your results in and val_rmse_advanced . Please pay attention to the format function used for output of the result. \# TODO Q6, add needed code below val_rmse_basic = val_rmse_advanced = print("Validation RMSE, basic {b}, advanced = {a} ".format(b=train_rmse_basic, a=train_rmse_advanced))

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

1118102282, 9781118102282

More Books

Students also viewed these Databases questions

Question

★★★★★

Can Electric Forces Alone Give Stable Equilibrium? In Chapter 21, several examples were given of calculating the force exerted on a point charge by other point charges in its surroundings. (a)...

Answered: 1 week ago

Question

★★★★★

1. Use a checking system to be sure you call on and include all students.

Answered: 1 week ago

Question

★★★★★

GigaCo. manufactures 1-GB flash drives (jump drives). Price and cost data for a relevant range extending to 200,000 units per month are as follows: Sales price per unit (current monthly sales volume...

Answered: 1 week ago

Question

★★★★★

I need help completing this table, thanks so much Requlred Information [The following information applies to the questions displayed below] Chuck Wagon Grills, Incorporated, makes a single product-a...

Answered: 1 week ago

Question

★★★★★

List three reasons why we allocate joint costs to individual products or services. Give an example of when the particular cost allocation reason would come into use.

Answered: 1 week ago

Question

★★★★★

In a local boutique, you intend to buy a handbag with an original price of $38, a jacket with an original price of $189, and a scarf with an original price of $23. Currently, the store is running a...

Answered: 1 week ago

Question

★★★★★

Please use the following information to answer the next question: A US firm's Accounts Payables (in UK) due in 1 year GBP 5,000,000 Current Spot rate for GBP is $2.00 Annual interest rate in US is 5%...

Answered: 1 week ago

Question

★★★★★

Epsilon inc. has just paid a dividend of $3 per share, and it is expected to pay a dividend of $3.15 per share in one year's time. Assuming that required return is 13% and the dividend growth will...

Answered: 1 week ago

Question

★★★★★

FAR 9.104-5 certification regarding responsibility matters is a DOD federal Acquisition Regulation that addresses the purchasing and disposition of items that are nonconforming. Explain.

Answered: 1 week ago

Question

★★★★★

Bob Martino has a checking account with a local bank. Under SEC independence rules, which condition would allow him to work on the firm's audit of the bank? The amount of money in his account is not...

Answered: 1 week ago

Question

★★★★★

3. Most employees do not disclose the fraud and corruption they observe in the workplace, and, therefore, criminal and immoral behaviors continue to flourish. Who or what can intercept this lack of...

Answered: 1 week ago

Question

★★★★★

1. The first Stew Leonards retail grocery was established in 1969. Do you think the culture established and nurtured at this successful firm could be adopted by other retail grocery stores? Explain...

Answered: 1 week ago

Question

★★★★★

You are offered a great job, but you have to relocate to a distant city. Your family agrees that the decision is yours to make, but you know they do not want to move. What do you do?

Answered: 1 week ago

Previous Question Next Question