Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Use the link in the Jupyter Notebook activity to access your Python script. Once you have made your calculations, complete this discussion. The script will

Use the link in the Jupyter Notebook activity to access your Python script. Once you have made your calculations, complete this discussion. The script will output answers to the questions given below. You must attach your Python script output as an HTML file and respond to the questions below.

In this discussion, you will apply the statistical concepts and techniques covered in this week's reading about correlation coefficient and simple linear regression. A car rental company wants to evaluate the premise that heavier cars are less fuel efficient than lighter cars. In other words, the company expects that fuel efficiency (miles per gallon) and weight of the car (often measured in thousands of pounds) are correlated. Performing this analysis will help the company optimize its business model and charge its customers appropriately.

In this discussion, you will work with a cars data set that includes two variables:

  • Miles per gallon (coded as mpg in the data set)
  • Weight of the car (coded as wt in the data set)

The random sample will be drawn from a CSV file. This data will be unique to you, and therefore your answers will be unique as well. Run Step 1 in the Python script to generate your unique sample data.

In your initial post, address the following items:

  1. You created a scatterplot of miles per gallon against weight; check to make sure it was included in your attachment. Does the graph show any trend? If yes, is the trend what you expected? Why or why not? See Step 2 in the Python script.
  2. What is the coefficient of correlation between miles per gallon and weight? What is the sign of the correlation coefficient? Does the coefficient of correlation indicate a strong correlation, weak correlation, or no correlation between the two variables? How do you know? See Step 3 in the Python script.
  3. Write the simple linear regression equation for miles per gallon as the response variable and weight as the predictor variable. How might the car rental company use this model? See Step 4 in the Python script.
  4. What is the slope coefficient? Is this coefficient significant at a 5% level of significance (alpha=0.05)? (Hint: Check the P-value,, for weight in the Python output.) See Step 4 in the Python script

Here is my simple linear regression data:

Step 1: Generating cars dataset

This block of Python code will generate the sample data for you. You will not be generating the dataset using numpy module this week. Instead, the dataset will be imported from a CSV file. To make the data unique to you, a random sample of size 30, without replacement, will be drawn from the data in the CSV file. The data set will be saved into a Python dataframe which you will use in later calculations.

Click the block of code below and hit theRunbutton above.

In[1]:

import pandas as pd from IPython.display import display, HTML # read data from mtcars.csv data set. cars_df_orig = pd.read_csv("https://s3-us-west-2.amazonaws.com/data-analytics.zybooks.com/mtcars.csv") # randomly pick 30 observations without replacement from mtcars dataset to make the data unique to you. cars_df = cars_df_orig.sample(n=30, replace=False) # print only the first five observations in the data set. print(" Cars data frame (showing only the first five observations)") display(HTML(cars_df.head().to_html())) Cars data frame (showing only the first five observations) 

Unnamed: 0mpgcyldisphpdratwtqsecvsamgearcarb26Porsche 914-226.04120.3914.432.14016.70015219Toyota Corolla33.9471.1654.221.83519.9011413Hornet 4 Drive21.46258.01103.083.21519.4410311Mazda RX4 Wag21.06160.01103.902.87517.02014420Toyota Corona21.54120.1973.702.46520.011031

Step 2: Scatterplot of miles per gallon against weight

The block of code below will create a scatterplot of miles per gallon (coded as mpg in the data set) and weight of the car (coded as wt).

Click the block of code below and hit theRunbutton above.

NOTE: If the plot is not created, click the code section and hit theRunbutton again.

In[3]:

import matplotlib.pyplot as plt # create scatterplot of variables mpg against wt. plt.plot(cars_df["wt"], cars_df["mpg"], 'o', color='red') # set a title for the plot, x-axis, and y-axis. plt.title('MPG against Weight') plt.xlabel('Weight (1000s lbs)') plt.ylabel('MPG') # show the plot. plt.show() 

Step 3: Correlation coefficient for miles per gallon and weight

Now you will calculate the correlation coefficient between the miles per gallon and weight variables. Thecorrmethod of a dataframe returns the correlation matrix with correlation coefficients between all variables in the dataframe. In this case, you will specify to only return the matrix for the variables "miles per gallon" and "weight".

Click the block of code below and hit theRunbutton above.

In[4]:

# create correlation matrix for mpg and wt.  # the correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column)  mpg_wt_corr = cars_df[['mpg','wt']].corr() print(mpg_wt_corr) mpg wt mpg 1.000000 -0.888844 wt -0.888844 1.000000 

Step 4: Simple linear regression model to predict miles per gallon using weight

The block of code below produces a simple linear regression model using "miles per gallon" as the response variable and "weight" (of the car) as a predictor variable. Theolsmethod in statsmodels.formula.api submodule returns all statistics for this simple linear regression model.

Click the block of code below and hit theRunbutton above.

In[6]:

from statsmodels.formula.api import ols # create the simple linear regression model with mpg as the response variable and weight as the predictor variable model = ols('mpg ~ wt', data=cars_df).fit() #print the model summary print(model.summary()) OLS Regression Results ============================================================================== Dep. Variable: mpg R-squared: 0.790 Model: OLS Adj. R-squared: 0.783 Method: Least Squares F-statistic: 105.4 Date: Tue, 27 Jul 2021 Prob (F-statistic): 5.39e-11 Time: 14:57:13 Log-Likelihood: -73.038 No. Observations: 30 AIC: 150.1 Df Residuals: 28 BIC: 152.9 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 39.0846 1.914 20.418 0.000 35.163 43.006 wt -5.9519 0.580 -10.265 0.000 -7.140 -4.764 ============================================================================== Omnibus: 1.845 Durbin-Watson: 1.713 Prob(Omnibus): 0.397 Jarque-Bera (JB): 1.644 Skew: 0.534 Prob(JB): 0.439 Kurtosis: 2.583 Cond. No. 13.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. 

Step 1: Generating cars dataset

This block of Python code will generate the sample data for you. You will not be generating the dataset using numpy module this week. Instead, the dataset will be imported from a CSV file. To make the data unique to you, a random sample of size 30, without replacement, will be drawn from the data in the CSV file. The data set will be saved into a Python dataframe which you will use in later calculations.

Click the block of code below and hit theRunbutton above.

In[1]:

import pandas as pd from IPython.display import display, HTML # read data from mtcars.csv data set. cars_df_orig = pd.read_csv("https://s3-us-west-2.amazonaws.com/data-analytics.zybooks.com/mtcars.csv") # randomly pick 30 observations without replacement from mtcars dataset to make the data unique to you. cars_df = cars_df_orig.sample(n=30, replace=False) # print only the first five observations in the data set. print(" Cars data frame (showing only the first five observations)") display(HTML(cars_df.head().to_html())) Cars data frame (showing only the first five observations) 

Unnamed: 0mpgcyldisphpdratwtqsecvsamgearcarb26Porsche 914-226.04120.3914.432.14016.70015219Toyota Corolla33.9471.1654.221.83519.9011413Hornet 4 Drive21.46258.01103.083.21519.4410311Mazda RX4 Wag21.06160.01103.902.87517.02014420Toyota Corona21.54120.1973.702.46520.011031

Step 2: Scatterplot of miles per gallon against weight

The block of code below will create a scatterplot of miles per gallon (coded as mpg in the data set) and weight of the car (coded as wt).

Click the block of code below and hit theRunbutton above.

NOTE: If the plot is not created, click the code section and hit theRunbutton again.

In[3]:

import matplotlib.pyplot as plt # create scatterplot of variables mpg against wt. plt.plot(cars_df["wt"], cars_df["mpg"], 'o', color='red') # set a title for the plot, x-axis, and y-axis. plt.title('MPG against Weight') plt.xlabel('Weight (1000s lbs)') plt.ylabel('MPG') # show the plot. plt.show() 

Step 3: Correlation coefficient for miles per gallon and weight

Now you will calculate the correlation coefficient between the miles per gallon and weight variables. Thecorrmethod of a dataframe returns the correlation matrix with correlation coefficients between all variables in the dataframe. In this case, you will specify to only return the matrix for the variables "miles per gallon" and "weight".

Click the block of code below and hit theRunbutton above.

In[4]:

# create correlation matrix for mpg and wt.  # the correlation coefficient between mpg and wt is contained in the cell for mpg row and wt column (or wt row and mpg column)  mpg_wt_corr = cars_df[['mpg','wt']].corr() print(mpg_wt_corr) mpg wt mpg 1.000000 -0.888844 wt -0.888844 1.000000 

Step 4: Simple linear regression model to predict miles per gallon using weight

The block of code below produces a simple linear regression model using "miles per gallon" as the response variable and "weight" (of the car) as a predictor variable. Theolsmethod in statsmodels.formula.api submodule returns all statistics for this simple linear regression model.

Click the block of code below and hit theRunbutton above.

In[6]:

from statsmodels.formula.api import ols # create the simple linear regression model with mpg as the response variable and weight as the predictor variable model = ols('mpg ~ wt', data=cars_df).fit() #print the model summary print(model.summary()) OLS Regression Results ============================================================================== Dep. Variable: mpg R-squared: 0.790 Model: OLS Adj. R-squared: 0.783 Method: Least Squares F-statistic: 105.4 Date: Tue, 27 Jul 2021 Prob (F-statistic): 5.39e-11 Time: 14:57:13 Log-Likelihood: -73.038 No. Observations: 30 AIC: 150.1 Df Residuals: 28 BIC: 152.9 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 39.0846 1.914 20.418 0.000 35.163 43.006 wt -5.9519 0.580 -10.265 0.000 -7.140 -4.764 ============================================================================== Omnibus: 1.845 Durbin-Watson: 1.713 Prob(Omnibus): 0.397 Jarque-Bera (JB): 1.644 Skew: 0.534 Prob(JB): 0.439 Kurtosis: 2.583 Cond. No. 13.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. 

Not sure of the questions can you assist me?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Real Analysis On Intervals

Authors: A D R Choudary, Constantin P Niculescu

1st Edition

8132221486, 9788132221487

More Books

Students also viewed these Mathematics questions

Question

4. Explain how to price managerial and professional jobs.pg 87

Answered: 1 week ago