Question
Python!!! Python homework for Regression Model I have provided the original data set, and part of the code. I hope that you can help me
Python!!! Python homework for Regression Model
I have provided the original data set, and part of the code. I hope that you can help me with Question d_1, d_2, f, g, h.
Thank you!
In this problem we will explore our first data set using pandas (for loading and processing our data) and sklearn (for building machine learning models).
==================Code Chunk======================
from sklearn.linear_model import LinearRegression import pandas as pd import pylab as plt import seaborn import numpy.random as nprnd import random %matplotlib inline df = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0) df.head()
==================Code Chunk======================
Probelm : Predict sales using sklearn
- Split data into training and testing subsets.
- Train model using LinearRegression() from sklearn.linear_model on training data.
- Evaluate using RMSE and R^2 on testing set
====================Code Chunk==========================
from sklearn.linear_model import LinearRegression
# Set y to be the sales in df
y = df['sales']
# Set X to be just the features described above in df, also create a new column called interecept which is just 1.
X = df.drop(['sales'],1)
# Randomly split data into training and testing - 80% training, 20% testing.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
====================Code Chunk==========================
Please provide python code and relevant answers. (Screenshots of your Jupyter Notebook are okay!!! )
Out [3]: 1 230.1 37.8 2 44.5 39.3 3 17.2 45.9 4 151.5 41.3 5 180.8 10.8 TV radio newspaper sales 69.2 22.1 45.1 10.4 69.3 9.3 58.5 18.5 58.4 12.9 What are the features (variables, covariates, all mean the same thing)? TV: advertising dollars spent on TV for a single product in a given market (in thousands of dollars) Radio: advertising dollars spent on Radio Newspaper: advertising dollars spent on Newspaper . Sales: Number of 1k units sold Goal: Predict the amount of sales in a given market based on the advertising in TV, Radio and Newspaper. [5 points] d_1) Train model on training data, and make predictions on testing data, using our solution from class It will be useful to use np. 1linalg. inverse. In [19]: # Code here [5 points] d 2) Train model on training data, and make predictions on testing data, using sklearn. linear_model. LinearRegression . Make sure your answer matches part d 1) In [79]: # Code here [5 points] f Interpreting the coefficients of your model ( clf. coef_1 ), which form of advertising appears to have the largest impact on sales? Which has the least impact? In [ ] : # Answer here [10 points] g) Plot the coefficients along with their confidence intervals, recalling that The variance of the coefficients are the diagonal elemements of the covariance matrix 2(X residuals -1, where is the estimated Ensure you obtain the same results for the variance of the coefficients as when you use import scipy, scipy. stats result-sm. OLS ( y, X ), fit() result. summary O In [ ]: # Code here [10 points] h) Repeat the steps above but build a seperate model for each individual feature, ie. X df [col] where col is one of the variables TV, radio and newspaper. Based on this analysis, which feature now appears to have more of an influence on sales? Which has practically none? Provide an interpretation of this apparent contradiction. Hint: It may be useful to check the correlation matrix using df. corr another and to understand how the covariates relate to one In [ ] : # Code and Answer here Out [3]: 1 230.1 37.8 2 44.5 39.3 3 17.2 45.9 4 151.5 41.3 5 180.8 10.8 TV radio newspaper sales 69.2 22.1 45.1 10.4 69.3 9.3 58.5 18.5 58.4 12.9 What are the features (variables, covariates, all mean the same thing)? TV: advertising dollars spent on TV for a single product in a given market (in thousands of dollars) Radio: advertising dollars spent on Radio Newspaper: advertising dollars spent on Newspaper . Sales: Number of 1k units sold Goal: Predict the amount of sales in a given market based on the advertising in TV, Radio and Newspaper. [5 points] d_1) Train model on training data, and make predictions on testing data, using our solution from class It will be useful to use np. 1linalg. inverse. In [19]: # Code here [5 points] d 2) Train model on training data, and make predictions on testing data, using sklearn. linear_model. LinearRegression . Make sure your answer matches part d 1) In [79]: # Code here [5 points] f Interpreting the coefficients of your model ( clf. coef_1 ), which form of advertising appears to have the largest impact on sales? Which has the least impact? In [ ] : # Answer here [10 points] g) Plot the coefficients along with their confidence intervals, recalling that The variance of the coefficients are the diagonal elemements of the covariance matrix 2(X residuals -1, where is the estimated Ensure you obtain the same results for the variance of the coefficients as when you use import scipy, scipy. stats result-sm. OLS ( y, X ), fit() result. summary O In [ ]: # Code here [10 points] h) Repeat the steps above but build a seperate model for each individual feature, ie. X df [col] where col is one of the variables TV, radio and newspaper. Based on this analysis, which feature now appears to have more of an influence on sales? Which has practically none? Provide an interpretation of this apparent contradiction. Hint: It may be useful to check the correlation matrix using df. corr another and to understand how the covariates relate to one In [ ] : # Code and Answer hereStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started