Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

As usual, let's do some exploratory data analysis using the concepts taught over the first three weeks! Firstly, let us examine the first few rows

As usual, let's do some exploratory data analysis using the concepts taught over the first three weeks!

Firstly, let us examine the first few rows of the dataset just to have a look:

[47]:

#Examine the first few rows of the cars dataset.

head(cars)

A data.frame: 6 2

speed

dist

1

4

2

2

4

10

3

7

4

4

7

22

5

8

16

6

9

10

Recall from computer lab worksheet 1 that one way to get a quick overview of the data is to use the str( ) function to obtain the structure of the dataset. It gives us useful information such as the number of observations, number of variables, names of the variables and the data types of the variables.

[48]:

str(cars)

'data.frame': 50 obs. of 2 variables:

$ speed: num 4 4 7 7 8 9 10 10 10 11 ...

$ dist : num 2 10 4 22 16 10 18 26 34 17 ...

In this dataset, the stopping distances of the cars (dist) is the response variable while the speeds of these cars (speed) is the regressor. The variables representing the response variable and regressor have already been defined for you below:

[49]:

# Define the response and regressor variables

dist ->

speed ->

???????????????? 3.1.1Exercise 3.1.1: Plot a histogram for the response variable (dist). (Hint: Remember that we need to make use of the hist( ) function.)

[50]:

# Draw a histogram for the variable dist.

?

???????????????? 3.1.2Exercise 3.1.2: Comment on the histogram that you plotted in ???????????????? 3.1.1Exercise 3.1.1. (Remember to start your comment using the # symbol)

[51]:

# Comment on the histogram for dist.

?

???????????????? 3.1.3Exercise 3.1.3: Plot a scatterplot of dist versus speed. (Hint: Refer back to Section 2.1 for help!)

[52]:

# Plot the scatterplot of dist versus speed.

?

Great! Let us now proceed to calculate the following quantities so that we can calculate the estimated regression coefficients for the regression line that we aim to fit later on.

???????????????? 3.1.4Exercise 3.1.4 Compute ??n by finding the length of the vector for the response variable. Assign it to the variable cars_n and print it.

[53]:

# Create the variable cars_n:

?

???????????????? 3.1.5Exercise 3.1.5: Compute ??????????inxi. Assign it to the variable sum_speed and print it.

[54]:

# Create the variable sum_speed:

?

???????????????? 3.1.6Exercise 3.1.6: Compute ??????????inyi. Assign it to the variable sum_dist and print it.

[55]:

# Create the variable sum_dist:

?

???????????????? 3.1.8Exercise 3.1.8: Compute ???????2???inxi2. Assign it to the variable sum_speedsq and print it.

[56]:

# Create the variable sum_speedsq:

?

???????????????? 3.1.9Exercise 3.1.9: Compute ??????????????inxiyi. Assign it to the variable cars_sum_cross and print it.

[57]:

# Create the variable cars_sum_cross:

?

???????????????? 3.1.10Exercise 3.1.10:

Compute ??1b1, the least squares estimate for the ??11 regression coefficient.

Recall that the formula for computing ??1b1 is given by ?????(????-??)(????-??)?????(????-??)2?in(xi-x)(yi-y)?in(xi-x)2 = ??????????????? - (?????????)(?????????)?????????2?? - (?????????)2n?inxiyi - (?inxi)(?inyi)n?inxi2 - (?inxi)2.

Assign it to the variable cars_b1 and print it.

[58]:

# Compute and create the variable cars_b1:

?

???????????????? 3.1.11Exercise 3.1.11:

Compute ??0b0, the least square estimate for the ??00 regression coefficient.

Recall that the formula for computing ??0b0 is given by ??-??1??y-b1x = ????????? - ??1????????????inyi - b1?inxin.

Assign it to the variable cars_b0 and print it.

[59]:

# Compute and create the variable cars_b0:

?

???????????????? 3.1.12Exercise 3.1.12: Having computed the estimated regression coefficients, we can now find the equation of the fitted regression line. Remove the # symbol to display the estimated regression line.

[60]:

#cat("The estimated regression line is given by y_hat = ", cars_b0, "+", cars_b1, "x")

???????????????? 3.1.13Exercise 3.1.13: Plot the scatterplot of dist versus speed again. Then, plot the estimated regression line onto the scatterplot.

[61]:

# Plot estimated regression line onto the scatterplot of dist versus speed.

?

???????????????? 3.1.14Exercise 3.1.14: Predict the stopping distance of the car when its speed is 13.7 mph using the equation of the fitted regression line that you have found earlier. Assign the prediction to the variable dist_pred and print it.

[62]:

# Prediction when speed is 13.7 mph.

?

3.2 Performing simple linear regression using built-in R functions

Great! Now, we will perform simple linear regression on the Cars dataset using built-in R functions instead.

???????????????? 3.2.1Exercise 3.2.1: Fit a simple linear regression model using the lm( ) function. Assign the fitted model to the variable cars_fit and print it.

[63]:

# Define the response and regressor variables

dist ->

speed ->

?

# Fit a simple linear regression model

?

???????????????? 3.2.2Exercise 3.2.2: Find the 95% confidence intervals for regression coefficients ??00 and ??11 by using the confint( ) function.

[64]:

# Obtain the 95% confidence intervals for the regression parameters.

?

???????????????? 3.2.3Exercise 3.2.3: Once again, predict the stopping distance of the car when its speed is 13.7 mph and show the 95% ???????????????????? ???????????????? ?????? ?????? ???????? ????????????????95% confidence interval for the mean response.

Hint:

  • First, create a dataframe containing the new data for a single prediction and assign it to the variable new_speed.
  • Make the prediction using the predict( ) function and assign the prediction to ????????_????????_??????????????dist_pred_confint and print it.
  • Remember to set the correct string value under the interval argument to calculate the confidence interval and the correct level as well!

[65]:

# Obtain prediction together with the 95% confidence interval for the mean response.

?

???????????????? 3.2.4Exercise 3.2.4: This time, predict the stopping distance of the car when its speed is 13.7 mph and show the 95% ???????????????????? ???????????????? ?????? ?????? ?????????????????? ????????????????95% prediction interval for the predicted response.

Hint:

  • You have already created a dataframe containing the new data for a single prediction and have assigned it the variable new_speed.
  • Make the prediction using the predict( ) function and assign the prediction to ????????_????????_??????????????dist_pred_predint and print it.
  • Remember to set the correct string value under the interval argument to calculate the prediction interval and the correct level as well!

[66]:

# Obtain the prediction together with the 95% prediction interval for the predicted response.

?

???????????????? 3.2.5Exercise 3.2.5: Obtain the residuals resulting from the fitted regression model using the residuals( ) function. Assign it to the variable cars_residuals and print it.

[67]:

# Obtain residuals from fitted regression model:

?

???????????????? 3.2.6Exercise 3.2.6: Construct a residual plot where you plot the residuals against the car speeds.

[68]:

# Construct a residual plot (residuals vs speed):

?

???????????????? 3.2.7Exercise 3.2.7: Observe the residual plot that you have just constructed. Do the residuals look independent? (Remember to start your comment using the # symbol)

[69]:

# Comment on whether the residuals look independent.

?

???????????????? 3.2.8Exercise 3.2.8: Do the residuals have a constant variance throughout the plot? (Remember to start your comment using the # symbol)

[70]:

# Comment on the variance of the residuals.

?

3.3 Performing simple linear regression using built-in R functions (With transformations!)

When the residuals do not fulfill the model assumptions (i.e. the errors are not normally distributed, the residuals do not look independent, the residuals do not have a homogenous variance), performing a transformation on the response variable might help resolve these issues.

Here, we carry out a square root transformation on the response variable (stopping distance of the car). The re-defined variables have already been coded for you below:

[71]:

sqrt_dist ->

speed ->

???????????????? 3.3.1Exercise 3.3.1: Plot a histogram for the transformed response variable (sqrt_dist). (Hint: Remember that we need to make use of the hist( ) function.)

[72]:

# Draw a histogram for the variable sqrt_dist.

?

???????????????? 3.3.2Exercise 3.3.2: Comment on the histogram that you plotted in ???????????????? 3.3.1Exercise 3.3.1. (Remember to start your comment using the # symbol)

[73]:

# Comment on the histogram for sqrt_dist.

?

We will now conduct the simple linear regression analysis again using the transformed response variable sqrt_dist instead.

???????????????? 3.3.3Exercise 3.3.3: Fit a simple linear regression model using the variable sqrt_dist and the lm( ) function. Assign the new fitted model to the variable cars_new_fit and print it.

[74]:

# Fit a simple linear regression model

?

???????????????? 3.3.4Exercise 3.3.4: Find the new 95% confidence intervals for the regression coefficients ??00 and ??11 by using the confint( ) function.

[75]:

# Obtain the new 95% confidence intervals for the regression parameters.

?

???????????????? 3.3.5Exercise 3.3.5: Obtain the residuals resulting from the new fitted regression model using the residuals( ) function. Assign it to the variable cars_new_residuals and print it.

[76]:

# Obtain residuals from new fitted regression model:

?

???????????????? 3.3.6Exercise 3.3.6: Construct a new residual plot where you plot the new residuals against the car speeds.

[77]:

# Construct a residual plot (residuals vs speed):

?

???????????????? 3.3.7Exercise 3.3.7: Observe the residual plot that you have just constructed. Do the residuals look independent? (Remember to start your comment using the # symbol)

[78]:

# Comment on whether the new residuals look independent.

?

???????????????? 3.3.8Exercise 3.3.8: Do the residuals have a constant variance throughout the plot? (Remember to start your comment using the # symbol)

[79]:

# Comment on the variance of the new residuals.

?

???????????????? 3.3.9Exercise 3.3.9: Now, we will find out if our fitted model is statistically significant. Compute the summary of the new fitted model by using the summary( ) function.

[80]:

# Summary of the new fitted model:

?

Hypothesis testing on the slope using a t-test:

Firstly, let us set a 0.05 level of significance.

The hypotheses are ??0:??1=0H0:1=0 and ??1:??1?0H1:1?0.

???????????????? 3.3.10Exercise 3.3.10: What is the P-value for this hypothesis test? (Remember to start your comment using the # symbol)

[81]:

# Find the P-value.

?

???????????????? 3.3.11Exercise 3.3.11: What is the conclusion made for this hypothesis test? (Remember to start your comment using the # symbol)

[82]:

# State conclusion resulting from the hypothesis test

?

???????????????? 3.3.12Exercise 3.3.12: Interpret the (multiple) R-squared value. Does it indicate a good model fit? (Remember to start your comment using the # symbol)

[83]:

# Interpret and comment on the R^2 value:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

App Inventor

Authors: David Wolber, Hal Abelson

1st Edition

1449397484, 9781449397487

More Books

Students also viewed these Programming questions

Question

Trace Greek medical thought from Aesculapius to Hippocrates.

Answered: 1 week ago