Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Need help with Q3 ### Allowable packages The only allowable package is `tidyverse`. You should not use any other packages as CodeGrade is not set

Need help with Q3

### Allowable packages

The only allowable package is `tidyverse`. You should not use any other packages as CodeGrade is not set up to accept them on this assignment.

### Data Set

The data set for this assignment is called **`avocados`**. The data was collected by the Hass Avocado Board to track the proce and sales volume of Hass avocados from 2015 to 2023. More information is available on Kaggle [here](https://www.kaggle.com/datasets/vakhariapujan/avocado-prices-and-sales-volume-2015-2023?resource=download).

You are expected to use pipe notation

Before we run our correlations and regressions, we need to clean our data and prepare it for different types of regressions. All of the changes should be made in and saved as the `avocados` dataframe. First, remove all rows for regions at the city and state level and the entire United States. Hint: look at the `region` column for values that do not indicate a region related to city or state; there are eight regions that should remain.

### Next, create a new variable called `type_bin` to convert values in the `type` columns into binary values. Organic avocados should be labeled as `1`. The updated dataframe should include only the following six columns: `Year`, `Month`, `AveragePrice`, `plu4225`, `type_bin`, and `region` (you will use `plu4225` later).

### Finally, save the first five rows of the updated dataframe as Q1

Before moving on, we will check for and remove any outliers in the AveragePrice column. Use the 3SD method to identify and remove outliers from this column; `avocados` should now reflect the data with the outliers removed.

### Save the number of rows in the updated `avocados` dataset (after outlier removal) to Q2.

- Assign the number of rows remaining in the dataframe to Q2

Q3 Create a correlation matrix of the relationships between Year, Month, AveragePrice, and type_bin for the Northeast region.

- Assign the matrix to Q3. It should look something like this:

```

Year Month AveragePrice type_bin

Year [value] [value] [value] [value]

Month [value] [value] [value] [value]

AveragePrice [value] [value] [value] [value]

type_bin [value] [value] [value] [value]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Linear Algebra A Modern Introduction

Authors: David Poole

4th edition

1285463242, 978-1285982830, 1285982835, 978-1285463247

More Books

Students also viewed these Mathematics questions

Question

Create your own frequency distribution on the topic of your choice

Answered: 1 week ago

Question

Prove that (R 2 , U) is second countable.

Answered: 1 week ago

Question

What is population?

Answered: 1 week ago

Question

Explain the study in demography?

Answered: 1 week ago

Question

Define social demography?

Answered: 1 week ago

Question

What is migration?

Answered: 1 week ago