Question

1 Approved Answer

Posted on Jul 05, 2024

1:57 7 OneDrive Done Module Six Discussion Module Six Discussion: Multiple Regression This notebook contains the step-by-step directions for your Module Six discussion. It is

1:57 7 OneDrive Done Module Six Discussion Module Six Discussion: Multiple Regression This notebook contains the step-by-step directions for your Module Six discussion. It is very important to run through the steps in order. Some steps depend on the outputs of earlier steps. Once you have completed the steps in this notebook, be sure to answer the questions about this activity in the discussion for this module. Reminder: If you have not already reviewed the discussion prompt, please do so before beginning this activity. That will give you an idea of the questions you will need to answer with the outputs of this script. Initial post (due Thursday) Step 1: Generating cars dataset This block of Python code will generate the sample data for you. You will not be generating the data set using numpy module this week. Instead, the data set will be imported from a CSV file. To make the data unique to you, a random sample of size 30, without replacement, will be drawn from the data in the CSV file. The data set will be saved in a Python dataframe that will be used in later calculations. Click the block of code below and hit the Run button above. In [1] : import pandas as pd from IPython . display import display, HTML # read data from mtcars. csv data set. cars_df_orig = pd. read_csv( "https: //s3-us-west-2 . amazonaws . com/data-analyt ics . zybooks . com/mtcars . csv" ) # randomly pick 30 observations from the data set to make the data set uni que to you. cars_df = cars_df_orig . sample (n=30, replace=False) # print only the first five observations in the dataset. print( "Cars data frame (showing only the first five observations) \ ") display (HTML (cars_df . head( ) . to_html ( ) ) ) Cars data frame (showing only the first five observations) Unnamed: 0 npg cyl disp hp drat wt qsec vs am gear carb 17 Fiat 128 32.4 78.7 56 4.08 2.200 19.47 13 Merc 450SLC 15.2 8 275.8 180 0 3.07 3.780 18.00 0 0 3 3 16 Chrysler Imperial 14 440.0 230 3.23 5.345 17.42 0 0 3 4 5 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 0 3 Honda Civic 30.4 4 75.7 52 4.93 1.615 5 18.52 1 4 2 Step 2: Scatterplot of miles per gallon against weight The block of code below will create a scatterplot of the variables "miles per gallon" (coded as mpg in the data set) and "weight" of the car (coded as wt). Click the block of code below and hit the Run button above. NOTE: If the plot is not created, click the code section and hit the Run button again. In [6]: import matplotlib. pyplot as pit # create scatterplot of variables mpg against wt. pit. plot (cars_df [ "wt" ], cars_df [ "mpg"], 'o', color='red' ) * set a title for the plot, x-axis, and y-axis. pit. title( 'MPG against Weight' ) pit. xlabel( 'Weight (1000s 1bs) ') pit . ylabel ( 'MPG' ) # show the plot. pit . show ( ) MPG against Weight 25 15 10 15 2.0 25 3.0 3.5 4.0 45 5.0 5.5 Weight (1000s lbs) Step 3: Scatterplot of miles per gallon against horsepower The block of code help erplot of the variables "miles per gallon" (coded as mpg in the data set) and "horsepower" of the car (coded as hp).1:57 7 OneDrive Done Module Six Discussion Step 3: Scatterplot of miles per gallon against horsepower The block of code below will create a scatterplot of the variables "miles per gallon" (coded as mpg in the data set) and "horsepower" of the car (coded as hp). Click the block of code below and hit the Run button above. NOTE: If the plot is not created, click the code section and hit the Run button again. In [3]: import matplotlib. pyplot as plt # create scatterplot of variables mpg against hp. pit. plot (cars_df ["hp" ], cars_df["mpg"], 'o', color='blue' ) # set a title for the plot, x-axis, and y-axis. pit. title( 'MPG against Horsepower' ) pit. xlabel ( 'Horsepower' ) pit . ylabel ( 'MPG' # show the plot. pit . show ( ) MPG against Horsepower 35 25 MPC OC . . 15 10 1 50 100 150 200 250 Horsepower Step 4: Correlation matrix for miles per gallon, weight and horsepower Now you will calculate the correlation coefficient between the variables "miles per gallon" and "weight". You will also calculate the correlation coefficient between the variables "miles per gallon" and "horsepower". The corr method of a dataframe returns the correlation matrix with the correlation coefficients between all variables in the dataframe. You will specify to only return the matrix for the three variables. Click the block of code below and hit the Run button above. In [4]: # create correlation matrix for mpg, wt, and hp. # The correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column) . # The correlation coefficient between mpg and hp is contained in the cell for mpg row and hp column (or hp row and mpg column) . mpg_wt_corr = cars_df [ [ 'mpg' , 'wt' , 'hp' ] ] . corr ( ) print (mpg_wt_corr ) mpg wt hp mpg 1. 000000 -0. 876508 -0.817420 wt -0. 876508 1. 000000 0. 746177 hp -0. 817420 0. 746177 1.000000 Step 5: Multiple regression model to predict miles per gallon using weight and horsepower This block of code produces a multiple regression model with "miles per gallon" as the response variable, and "weight" and "horsepower" as predictor variables. The ols method in statsmodels.formula.api submodule returns all statistics for this multiple regression model. Click the block of code below and hit the Run button above. In [5]: from statsmodels. formula . api import ols create the multiple regression model with mpg as the response variable; weight and horsepower as predictor variables. model = ols ( 'mpg ~ wtthp', data=cars_df) . fit( ) print (model . summary ( ) ) OLS Regression Results Dep. Variable: mpg R-squared: 0 . 828 Model OLS Adj . R-squared: 0 . 816 Method : Least Squares F-statistic: 65. 22 Date: Tue, 06 Apr 2021 Prob (F-statistic) : 4.6 0e-11 Time : 20: 19:43 Log-Likelihood: -6 9. 895 No. Observations : 30 AIC: 145 . 8 Df Residuals: 27 BIC : 150 .0 Df Model: Covariance Type: nonrobust coef std err P> | t | [0 . 025 975] -----1:57 7 OneDrive Done Module Six Discussion Now you will calculate the correlation coefficient between the variables "miles per gallon" and "weight". You will also calculate the correlation coefficient between the variables "miles per gallon" and "horsepower". The corr method of a dataframe returns the correlation matrix with the correlation coefficients between all variables in the dataframe. You will specify to only return the matrix for the three variables. Click the block of code below and hit the Run button above. In [4] : # create correlation matrix for mpg, wt, and hp. # The correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column) . # The correlation coefficient between mpg and hp is contained in the cell for mpg row and hp column (or hp row and mpg column) . mpg_wt_corr = cars_df [ [ 'mpg' , 'wt' , 'hp' ] ] . corr( ) print (mpg_wt_corr) mpg wt hp mpg 1. 000000 -0. 876508 -0. 817420 wt -0. 876508 1. 000000 0. 746177 hp -0. 817420 0. 746177 1.000000 Step 5: Multiple regression model to predict miles per gallon using weight and horsepower This block of code produces a multiple regression model with "miles per gallon" as the response variable, and "weight" and "horsepower" as predictor variables. The ols method in statsmodels.formula.api submodule returns all statistics for this multiple regression model. Click the block of code below and hit the Run button above. In [5] : from statsmodels . formula. api import ols # create the multiple regression model with mpg as the response variable; weight and horsepower as predictor variables. model = ols( 'mpg ~ wtthp', data=cars_df) . fit( ) print (model . summary ( ) ) OLS Regression Results Dep. Variable: mpg R-squared: 0 . 828 Model : OLS Adj. R-squared: 0 . 816 Method: Least Squares F-statistic: 65. 22 Date : Tue, 06 Apr 2021 Prob (F-statistic) : 4. 6 De-11 Time 20 : 19:43 Log-Likelihood: -6 9 . 895 No. Observations : 30 AIC 145 .8 Df Residuals: 27 BIC : 150.0 Df Model: 2 Covariance Type: nonrobust coef std err P> | t| [0 . 025 975] Intercept 37 . 2011 1. 619 22 . 984 0 . 000 33 .880 0. 522 wt -3. 6386 0. 724 -5 . 024 0 . 000 -5 . 125 2. 153 hr -0. 0378 0 . 012 -3. 079 0 . 005 -0. 063 0 . 013 Omnibus : 5. 855 Durbin-Watson : 2. 185 Prob (Omnibus ) : 0 . 054 Jarque-Bera (JB) : 4. 433 Skew 0. 917 Prob (JB) : 0 . 109 Kurtosis : 3. 426 Cond. No. 546. Warnings : [1] Standard Errors assume that the covariance matrix of the errors is co rrectly specified. End of initial post Attach the HTML output to your initial post in the Module Six discussion. The HTML output can be downloaded by clicking File, then Download as, then HTML. Be sure to answer all questions about this activity in the Module Six discussion. Follow-up posts (due Sunday) Return to the Module Six discussion to answer the follow-up questions in your response posts to other students. There are no Python scripts to run for your follow-up posts