Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python!!! Python homework for Regression Model I have provided the original data set, and part of the code. I hope that you can help me

Python!!! Python homework for Regression Model

I have provided the original data set, and part of the code. I hope that you can help me with Question d_1, d_2, f, g, h.

Thank you!

In this problem we will explore our first data set using pandas (for loading and processing our data) and sklearn (for building machine learning models).

==================Code Chunk======================

from sklearn.linear_model import LinearRegression import pandas as pd import pylab as plt import seaborn import numpy.random as nprnd import random %matplotlib inline df = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0) df.head()

==================Code Chunk======================

image text in transcribed

Probelm : Predict sales using sklearn

  • Split data into training and testing subsets.
  • Train model using LinearRegression() from sklearn.linear_model on training data.
  • Evaluate using RMSE and R^2 on testing set

====================Code Chunk==========================

from sklearn.linear_model import LinearRegression

# Set y to be the sales in df

y = df['sales']

# Set X to be just the features described above in df, also create a new column called interecept which is just 1.

X = df.drop(['sales'],1)

# Randomly split data into training and testing - 80% training, 20% testing.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

====================Code Chunk==========================

image text in transcribed

image text in transcribedimage text in transcribedimage text in transcribed

Please provide python code and relevant answers. (Screenshots of your Jupyter Notebook are okay!!! )

Out [3]: 1 230.1 37.8 2 44.5 39.3 3 17.2 45.9 4 151.5 41.3 5 180.8 10.8 TV radio newspaper sales 69.2 22.1 45.1 10.4 69.3 9.3 58.5 18.5 58.4 12.9 What are the features (variables, covariates, all mean the same thing)? TV: advertising dollars spent on TV for a single product in a given market (in thousands of dollars) Radio: advertising dollars spent on Radio Newspaper: advertising dollars spent on Newspaper . Sales: Number of 1k units sold Goal: Predict the amount of sales in a given market based on the advertising in TV, Radio and Newspaper. [5 points] d_1) Train model on training data, and make predictions on testing data, using our solution from class It will be useful to use np. 1linalg. inverse. In [19]: # Code here [5 points] d 2) Train model on training data, and make predictions on testing data, using sklearn. linear_model. LinearRegression . Make sure your answer matches part d 1) In [79]: # Code here [5 points] f Interpreting the coefficients of your model ( clf. coef_1 ), which form of advertising appears to have the largest impact on sales? Which has the least impact? In [ ] : # Answer here [10 points] g) Plot the coefficients along with their confidence intervals, recalling that The variance of the coefficients are the diagonal elemements of the covariance matrix 2(X residuals -1, where is the estimated Ensure you obtain the same results for the variance of the coefficients as when you use import scipy, scipy. stats result-sm. OLS ( y, X ), fit() result. summary O In [ ]: # Code here [10 points] h) Repeat the steps above but build a seperate model for each individual feature, ie. X df [col] where col is one of the variables TV, radio and newspaper. Based on this analysis, which feature now appears to have more of an influence on sales? Which has practically none? Provide an interpretation of this apparent contradiction. Hint: It may be useful to check the correlation matrix using df. corr another and to understand how the covariates relate to one In [ ] : # Code and Answer here Out [3]: 1 230.1 37.8 2 44.5 39.3 3 17.2 45.9 4 151.5 41.3 5 180.8 10.8 TV radio newspaper sales 69.2 22.1 45.1 10.4 69.3 9.3 58.5 18.5 58.4 12.9 What are the features (variables, covariates, all mean the same thing)? TV: advertising dollars spent on TV for a single product in a given market (in thousands of dollars) Radio: advertising dollars spent on Radio Newspaper: advertising dollars spent on Newspaper . Sales: Number of 1k units sold Goal: Predict the amount of sales in a given market based on the advertising in TV, Radio and Newspaper. [5 points] d_1) Train model on training data, and make predictions on testing data, using our solution from class It will be useful to use np. 1linalg. inverse. In [19]: # Code here [5 points] d 2) Train model on training data, and make predictions on testing data, using sklearn. linear_model. LinearRegression . Make sure your answer matches part d 1) In [79]: # Code here [5 points] f Interpreting the coefficients of your model ( clf. coef_1 ), which form of advertising appears to have the largest impact on sales? Which has the least impact? In [ ] : # Answer here [10 points] g) Plot the coefficients along with their confidence intervals, recalling that The variance of the coefficients are the diagonal elemements of the covariance matrix 2(X residuals -1, where is the estimated Ensure you obtain the same results for the variance of the coefficients as when you use import scipy, scipy. stats result-sm. OLS ( y, X ), fit() result. summary O In [ ]: # Code here [10 points] h) Repeat the steps above but build a seperate model for each individual feature, ie. X df [col] where col is one of the variables TV, radio and newspaper. Based on this analysis, which feature now appears to have more of an influence on sales? Which has practically none? Provide an interpretation of this apparent contradiction. Hint: It may be useful to check the correlation matrix using df. corr another and to understand how the covariates relate to one In [ ] : # Code and Answer here

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Database Systems

Authors: Ramez Elmasri, Sham Navathe

4th Edition

0321122267, 978-0321122261

More Books

Students also viewed these Databases questions