Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

### Name of File Name your assignment file * * ` BRFSS _ Part 1 ` * * . This is a Quarto markdown file,

### Name of File
Name your assignment file **`BRFSS_Part1`**. This is a Quarto "markdown" file, which has the file has the extension '.qmd'.
### Data Set
- These data come from the [Centers for Disease Control and Prevention](www.cdc.gov)
- To answer these questions you will need to use the codebook on Brightspace, called `BRFSS_2021 Codebook`. For part 2 of the project, please note that not all of the variables listed in the codebook are included in the .csv file to be downloaded from Brightspace.
- Download the `brfss2021.csv` file from Brightspace and place it in the same folder/directory as your script file. Then in RStudio, set your Working Directory to your Source File location: in the menus choose Session \| Set Working Directory \| To Source File Location. You most likely will see some warnings after it loads due to the fact that `read_csv()` will try to guess the column type but because there are so many rows it won't read enough of them to accurately make a guess.
- You must use the `read_csv()` function when loading the .csv file. Do not use read.csv().
- Do not rename the .csv file that you download from Brightspace.
- Do not edit the .csv file.
### Preliminaries
```{r}
rm(list = ls())
library(tidyverse)
library(psych)
library(lm.beta)
# This will take a few moments to load since the file is so large.
brf <- read_csv("brfss2021.csv", show_col_types = FALSE)
```
------------------------------------------------------------------------
## Questions
------------------------------------------------------------------------
### Q1: We will be analyzing three variables (described below) in part 1 of this project. Identify the names of the variables indicated below using the CodeBook provided on Brightspace. Using the data brfss2021.CSV data provided on Brightspace, create a dataframe `brf_part1` with only these three columns (in the order they were mentioned above), which you will use for the following questions. Do not rename the variables. Store the first 10 rows in `Q1`.
- a variable that measures how often the respondent eats fruit (not including juices).
- a variable that records the length of time since last routine medical checkup
- a variable that records the general health of the respondent.
Once you have created the new `brf_part1`, you might consider removing the original dataframe from your environment to save space with `remove(brf)`. If you do this, however, and you run Q1 again, it will likely error since you removed `brf`.
We encourage you to take note of the values of each of these three variables and familiarize yourself with them before continuing.
Hint: Your `brf_part1` dataframe should have the same number of rows as the original `brf` but now only 3 columns.
```{r}
### Do not edit the following line. It is used by CodeGrade.
# CG Q1 #
### TYPE YOUR CODE BELOW ###
### VIEW OUTPUT ###
Q1
```
## Cleaning
### Q2: Clean the dataframe `brf_part1` by removing the respondents who "refused", said "don't know/not sure" and any NAs from both the health variable and the length of time variable. See the CodeBook for details on what the values of the variables mean. Overwrite the existing `brf_part1`. Sort the resulting dataframe by the general health variable (from excellent health to poor health). Store the first ten rows of the resulting dataframe as `Q2`.
(In practice, it would be wise to create a new dataframe, but we are trying to save space for CodeGrade and on your local device.)
Hint: The resulting `brf_part1` dataframe is 431,750 x 3.
```{r}
### Do not edit the following line. It is used by CodeGrade.
# CG Q2 #
### TYPE YOUR CODE BELOW ###
### VIEW OUTPUT ###
Q2
```
### Q3: How many people (and what percentage) reported that in general their health is either good or very good? Your answer should be a dataframe with two values: the number and the percentage. Round the percentage to the nearest tenth. Store it as `Q3`.
The percentage is out of the total number of observations for the `brf_part1` dataset.
Hint: The answer should look like this (note the column names):
```
Count Percent
```
```{r}
### Do not edit the following line. It is used by CodeGrade.
# CG Q3 #
### TYPE YOUR CODE BELOW ###
### VIEW OUTPUT ###
Q3
```
### Q4: Create a dataframe showing the number and the proportion of individuals who said their health is excellent, very good or good for each of the different lengths of times since last checkup. Store as a dataframe named `Q4`. Round to three decimal places.
The percentage is out of the total number of observations for the `brf_part1` dataset. If your proportion does not match below, double check your Q2 cleaning.
Hint: The 5x3 dataframe should look like this. The `[...]` is the name of the length of time variable.
```
[...] n proportion
1
2
3
4

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introductory Relational Database Design For Business With Microsoft Access

Authors: Jonathan Eckstein, Bonnie R. Schultz

1st Edition

1119329418, 978-1119329411

More Books

Students also viewed these Databases questions

Question

Draw a labelled diagram of the Dicot stem.

Answered: 1 week ago