Question
Use the link in the Jupyter Notebook activity to access your Python script. Once you have made your calculations, complete this discussion. The script will
Use the link in the Jupyter Notebook activity to access your Python script. Once you have made your calculations, complete this discussion. The script will output answers to the questions given below. You must attach your Python script output as an HTML file and respond to the questions below.
In this discussion, you will apply the statistical concepts and techniques covered in this week's reading about correlation coefficient and simple linear regression. A car rental company wants to evaluate the premise that heavier cars are less fuel efficient than lighter cars. In other words, the company expects that fuel efficiency (miles per gallon) and weight of the car (often measured in thousands of pounds) are correlated. Performing this analysis will help the company optimize its business model and charge its customers appropriately.
In this discussion, you will work with a cars data set that includes two variables:
- Miles per gallon (coded as mpg in the data set)
- Weight of the car (coded as wt in the data set)
The random sample will be drawn from a CSV file. This data will be unique to you, and therefore your answers will be unique as well. Run Step 1 in the Python script to generate your unique sample data.
In your initial post, address the following items:
- You created a scatterplot of miles per gallon against weight; check to make sure it was included in your attachment. Does the graph show any trend? If yes, is the trend what you expected? Why or why not? See Step 2 in the Python script.
- What is the coefficient of correlation between miles per gallon and weight? What is the sign of the correlation coefficient? Does the coefficient of correlation indicate a strong correlation, weak correlation, or no correlation between the two variables? How do you know? See Step 3 in the Python script.
- Write the simple linear regression equation for miles per gallon as the response variable and weight as the predictor variable. How might the car rental company use this model? See Step 4 in the Python script.
- What is the slope coefficient? Is this coefficient significant at a 5% level of significance (alpha=0.05)? (Hint: Check the P-value,, for weight in the Python output.) See Step 4 in the Python script.
See images for more detail
Step 1: Generating cars dataset This block of Python code will generate the sample data for you. You will not be generating the dataset using numpy module this week. Instead, the dataset will be imported from a CSV file. To make the data unique to you, a random sample of size 30, without replacement, will be drawn from the data in the CSV file. The data set will be saved into a Python dataframe which you will use in later calculations. Click the block of code below and hit the Run button above. In [34]: import pandas as pd from IPython. display import display, HTML # read data from mtcars. cav data set. cars_df_orig = pd. read_csv("https://s3-us-west-2. amazonaws. com/data-analytics . zybooks. com/mtcars. cav") # randomly pick 30 observations without replacement from mtcars dataset to make the data unique to you. cars_df = cars_df_orig . sample(n=30, replace=False) # print only the first five observations in the data set. print("\ Cars data frame (showing only the first five observations)") display (HTML(cars_df . head() . to_html( ) ) ) Cars data frame (showing only the first five observations) Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 A 14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 3 2 6 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 3 25 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1Step 2: Scatterplot of miles per gallon against weight The block of code below will create a scatterplot of miles per gallon (coded as mpg in the data set) and weight of the car (coded as wt). Click the block of code below and hit the Run button above. NOTE: If the plot is not created, click the code section and hit the Run button again. In [3]: import matplotlib. pyplot as plt # create scatterplot of variables mpg against wt. plt. plot(cars_df["wt"], cars_df["mpg"], 'o', color='red' ) # set a title for the plot, x-axis, and y-axis. pit . title('MPG against Weight' ) plt . xlabel('Weight (10005 1bs)' ) pit. ylabel('MPG' ) # show the plot. pit . show( ) MPG against Weight 35 30 .. . MPG 20 15 . 10 15 20 25 3.0 3.5 40 4.5 5.0 5.5 Weight (1000s lbs)Step 3: Correlation coefficient for miles per gallon and weight Now you will calculate the correlation coefficient between the miles per gallon and weight variables. The corr method of a dataframe returns the correlation matrix with correlation coefficients between all variables in the dataframe. In this case, you will specify to only return the matrix for the variables "miles per gallon" and "weight". Click the block of code below and hit the Run button above. In [4]: # create correlation matrix for mpg and wt. # the correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column) mpg_wt_corr = cars_df[['mpg' , 'wt' ]]. corr() print (mpg_wt_corr) mpg wt mpg 1.006000 -0.861287 wt -0. 861287 1.090000Step 4: Simple linear regression model to predict miles per gallon using weight The block of code below produces a simple linear regression model using "miles per gallon" as the response variable and "weight" (of the car) as a predictor variable. The ols method in statsmodels. formula.api submodule returns all statistics for this simple linear regression model. Click the block of code below and hit the Run button above. In [5]: from statsmodels . formula. api import ols # create the simple Linear regression model with mpg as the response variable and weight as the predictor variable model = ols ('mpg ~ wt' , data=cars_df) . fit() #print the model summary print (model . summary () ) OLS Regression Results Dep. Variable: mpg R- squared: 0.742 Model : OLS Adj. R-squared : 3.733 Method: Least Squares F-statistic: 80.45 Date: Thu, 03 Feb 2022 Prob (F-statistic) : 1.00e-09 Time : 22:36:06 Log- Likelihood : -75.264 No. Observations : 30 AIC : 154.5 Of Residuals: 28 BIC : 157.3 Of Model : Covariance Type: nonrobust coef std err t [0. 025 0.975] Intercept 37.9890 2. 035 18.665 0.000 33. 820 42 . 158 wt -5. 6186 0. 626 -8.969 0.000 -6.902 -4.335 Omnibus : 4.395 Durbin-Watson: 1.972 Prob (Omnibus ) : 0. 111 Jarque-Bera (JB) : 3.458 Skew : 0. 831 Prob (JB) : 0. 177 Kurtosis : 3.050 Cond. No. 12.8 Warnings : [1] Standard Errors assume that the covariance matrix of the errors is correctly specifiedStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started