Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 29, 2024

https://drive.google.com/drive/folders/1F1VdLvOeg0e5aS1CAKzXMIkiy-CnOx-P?usp=sharing data if you click and download --- title: ISYE6414 - Midterm Exam 1 - Open Book Section (R) - Part 2 output: pdf_document: default

https://drive.google.com/drive/folders/1F1VdLvOeg0e5aS1CAKzXMIkiy-CnOx-P?usp=sharing

data if you click and download

--- title: "ISYE6414 - Midterm Exam 1 - Open Book Section (R) - Part 2" output: pdf_document: default html_document: default ---

```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ```

## Instructions

This R Markdown file includes the questions, the empty code chunk sections for your code, and the text blocks for your responses. Answer the questions below by completing this R Markdown file. You may make slight adjustments to get the file to knit/convert but otherwise keep the formatting the same. Once you've finished answering the questions, submit your responses in a single knitted file as *HTML* only.

There are 13 questions total, each worth between 3-8.5 points. Partial credit may be given if your code is correct but your conclusion is incorrect or vice versa.

*Next Steps:*

1. Save this .Rmd file in your R working directory - the same directory where you will download the "USA_cars_datasets.csv" data file into. Having both files in the same directory will help in reading the .csv file.

2. Read the question and create the R code necessary within the code chunk section immediately below each question. Knitting this file will generate the output and insert it into the section below the code chunk.

3. Type your answer to the questions in the text block provided immediately after the response prompt.

4. Once you've finished answering all questions, knit this file and submit the knitted file as *HTML* on Canvas.

### Mock Example Question

This will be the exam question - each question is already copied from Canvas and inserted into individual text blocks below, *you do not need to copy/paste the questions from the online Canvas exam.*

```{r} # Example code chunk area. Enter your code below the comment`

```

**Mock Response to Example Question**: This is the section where you type your written answers to the question. Depending on the question asked, your typed response may be a number, a list of variables, a few sentences, or a combination of these elements.

**Ready? Let's begin. We wish you the best of luck!**

**Recommended Packages** ```{r} library(car) ```

## Car Price Data Analysis

For this exam, you will be building a model to predict the price of second-hand cars (*price*).

The "USA_cars_datasets.csv" data set consists of the following variables:

* *price*: price of the second-hand car * *brand*: brand of the car * *year*: year of the car * *title_status*: clean vehicle or salvage insurance * *mileage*: mileage driven of the car * *color*: color of the car * *users*: number of previous users of the car

Read the data and answer the questions below. Assume a significance threshold of 0.05 for hypothesis tests unless stated otherwise.

```{r} # Read the data set cars = read.csv('USA_cars_datasets.csv', header=TRUE) #Set brand and title_status as categorical cars$brand<-as.factor(cars$brand) cars$title_status<-as.factor(cars$title_status) head(cars) ```

**Note:** For all of the following questions, treat all variables as quantitative variables except for *brand*, and *title_status*. They have already been converted to categorical variables in the above code.

### Question 1 - 5pts

Create an ANOVA model, called **anovamodel**, to compare the mean car price (*price*) among the different car brands (*brands*). Display the corresponding ANOVA table.

A) Identify the value of the mean squared error (MSE) from the ANOVA table.

B) Provide the formula that is used to calculate the MSE in the table, and clearly explain what this quantity represents in the context of ANOVA.

```{r} #Code to create ANOVA model...

```

**Response to Question 1A**:

**Response to Question 1B**:

**Formula**:

**Explanation**:

### Question 2 - 4pts

A) What can you conclude from the ANOVA table with respect to the test of equal means at a significance level of 0.05 (FAIL TO REJECT or REJECT the null hypothesis)? Explain your answer using the values from the ANOVA Table.

B) Given your answer in part A, explain what this conclusion means in the context of the car data analysis problem.

**Response to Question 2A**:

**Response to Question 2B**:

### Question 3 - 5.5 pts

Conduct a pairwise comparison of the mean *price* for the different *brands* using the Tukey method. Use a **90%** confidence level for this comparison.

A) According to the pairwise comparison, are the means of *audi* and *acura* plausibly equal? Explain how you came to your conclusion.

B) Provide an interpretation of "diff" in the context of the price between *audi* and *acura*. *(Note: provide this interpretation regardless of the means being statistically significantly different/equal)*

```{r} # Code to create pairwise-comparison...

```

**Response to Question 3A**:

**Response to Question 3B**:

### Question 4 - 8.5 pts

Perform a residual analysis on the ANOVA model (**anovamodel**). State whether each of the three ANOVA model assumptions holds and why you came to the conclusion. ```{r} # Code to perform residual analysis

```

**Response to Question 4**:

**Assumption 1**:

**Assumption 2**:

**Assumption 3**:

### Question 5 - 4 pts

Based on your assessment of the assumptions (Question 4), is the ANOVA model (**anovamodel**) a good fit? If not, how can you try to improve the fit? Provide two recommendations and specify which problem(s) could be improved by each recommendation. *(Note: Do not apply your recommendations.)*

**Response to Question 5**:

**Assessment on GOF**:

**Recommendation 1 and problem(s)**:

**Recommendation 2 and problem(s)**:

### Question 6 - 5pts

Now consider the quantitative predicting variables ONLY: *year*, *mileage*, *color*, *users*.

Compute the correlation coefficient (*r*) between each quantitative variable and the response (*price*).

A) Which predicting variable has the strongest linear relationship with the response? Which one has the weakest linear relationship?

B) Interpret the value of the strongest correlation coefficient in the context of the problem. Include strength (weak, moderate, strong) and direction (positive, negative).

```{r} # Code to calculate correlation...

```

**Response to Question 6A**:

**Var with strongest correlation**:

**Var with weakest correlation**:

**Response to Question 6B**:

### Question 7 - 4pts

Create a scatter plot to describe the relationship between the car price (*price*) and the mileage driven by the car (*mileage*). Describe the general trend (direction and form).

```{r} # Code to create scatter plot...

```

**Response to Question 7**:

### Question 8 - 4pts

Create a linear regression model, called **lm.full**, with *price* as the response variable and with **ALL** remaining variables (quantitative and qualitative) as the predictors. Display the summary table for the model. *Note: Treat all variables as quantitative variables except for brand, and title_status. Include an intercept.*

A) Is the model significant overall using an alpha of 0.01? Why/Why not?

```{r} # Code to create model and display summary table

```

**Response to Question 8A**:

### Question 9 - 3pts

What is the estimated coefficient for *year* in **lm.full**? Provide a brief meaningful interpretation of the estimated regression coefficient for *year* in **lm.full**.

**Response to Question 9**:

### Question 10 - 4pts

What are the bounds for a **95%** confidence interval on the coefficient for *users*? Using this confidence interval, is the coefficient for *users* plausibly equal to zero at this confidence level, given all other predictors in the model? Explain.

```{r} # Code to calculate 95% CI...

```

**Response to Question 10**:

### Question 11 - 4pts

Create a third model, called **lm.full2**, by adding an interactive term for mileage driven per user (mileage/users) to **lm.full**. Display the summary table for this model.

A) Examine the summary tables for **lm.full2** and **lm.full**. Is there any significant change in the direction and/or statistical significance of the regression coefficients? If so, list one change.

```{r} # Code to create lm.full2 and display summary table

```

**Response to Question 11A**:

### Question 12 - 4pts

Perform a Partial F-test on the new model (**lm.full2**) vs the previous model (**lm.full**), using $\alpha=0.05$.

A) State the Null and Alternative Hypothesis for this Partial F-test.

B) Do you *reject* or *fail to reject* the null hypothesis? Explain your answer using the output.

C) Based on these results, does the new variable (the interactive term for mileage driven per user) add to the explanatory power of the model, given all other predictors are included? (Yes or No should suffice in conjunction w/ 12B)

```{r} # Code for Partial F-Test...

```

**Response to Question 12A**:

**Response to Question 12B**:

**Response to Question 12C**:

### Question 13 - 5pts

Using **lm.full** model, what is the predicted *price* and corresponding **90% prediction interval** for a **clean** **2016** **Toyota** with **2** prior users, color **1** and **10,000** miles? Provide an interpretation of your results.

*(Note: The data point has been provided. Ensure you are using lm.full not lm.full2)*

```{r} # new observation... newpt <- data.frame(brand='toyota', year=2016, title_status="clean vehicle", users=2, color= 1, mileage=10000)

# Code for prediction interval...

```

**Response to Question 13**:

**The End.**