Question

1 Approved Answer

Posted on Jun 27, 2024

3. Collect your data We have been using the R library that contains NHANES data frequently in our class, but often our health research collaborators

3. Collect your data We have been using the R library that contains NHANES data frequently in our class, but often our health research collaborators will give us a file with the dataset. Download the file "fram3.cav" from Module 10 on our Canvas page and put it in your working directory. CSV stands for "comma-separated values" and is commonly used for datasets (https://en.wikipedia.org/wiki/Comma-separated_values). Sample % mutate (DIABETES = as_factor (DIABETES) ) null_distribution % specify(formula = TOTCHOL - DIABETES) %>% hypothesize(null = "independence") %>% generate (reps = 5000, type = "permute") *>% calculate(stat = "diff in means", order = c("1", "0")) ## Warning: Removed 214 rows containing missing values. Take a look at what is stored in null_distribution, it's just a tibble full of differences which were created when we randomly labeled our 200 observations as either "diabetes" or "no-diabetes." We can visualize the null distribution to see how extreme our observed sample is. Just like with confidence intervals we use visualize (null_distribution, bina =10) to see a histogram of the null distribution. We can shade in all values as extreme or more than our observed difference using shade_p_value (observed_diff , direction = "two_aided"). Note that we specify the "two-sided" choice here, corresponding to our two- sided null and alternative hypotheses. Note that both tails of this histogram are shaded, consistent with our two-sided hypotheses. 6. Calculate the p-value The p-value represents the probability of observing a summary statistic (e.g., a difference in means) as or more extreme than the statistic you observed ASSUMING the null hypothesis is TRUE. So, if there were truly no difference in the average cholesterol of the population of people with and without diabetes, what is the probability that we would observe a difference in sample means equal to, or larger in magnitude than, our observed_diff? We get the p-value using the command get_p_value(obs_stat = "observed_diff", direction = "two-sided"). Note that we must indicate what our observed difference was and the fact that our null and alternative were "two-sided" (#)- What is the p-value of your test? 8. Make your conclusion Based on the p-value, do you Reject or Fail to reject your null hypothesis? What does this imply about the mean cholesterol level among people with diabetes compared to those without? Answer in completesentences and include your a level. 2. Compare to a Confidence Interval Calculate the "SE" confidence interval for the difference in the means, using the same concepts as before. In the specify () command you'll need to put in the formula = TOTCHOL - DIABETES, and with a con- fidence interval we're still using bootstrap as the type of replications to generate. You DO NOT need a hypothesize () function in the confidence interval code (because there are no hypotheses involved in a confidence interval). Interpret your interval. Does it AGREE or DISAGREE with your hypothesis test? It should agree, in that if your hypothesis test rejected the null, then 0 should not be in the 95% confidence interval for the difference in the means (i.e. You can rule out 0 as a plausible value for the true difference in the mean cholesterol). If you failed to reject, the 0 should be in your 95% confidence interval (i.e. You can NOT rule out 0 as a plausible value for the true difference in the mean cholesterol). 3. Do divorced people have more trouble sleeping than married people? Suppose we wanted to test a single proportion (or mean) against a known or assumed value. For instance, recently a paper was published suggesting that the true human body temperature is no longer the assumed 98.6F, but has in fact decreased. A procedure like this involved defining a null hypothesis that compares your population parameter to a specified value. For instance, in the above example, the null and alternative hypotheses might have been: Ho : / 2 98.6 HA :A 98.6 and HA : a $ 98.6, then re- jecting the null would not indicate that the true mean is NOT 98.6, which is the whole point of this study. 3. We need to specify an assumed value for the true population parameter. Here, we assume the true population mean is 98.6. 4. The null and alternative are concerned with parameters () not sample statistics (X). For this example, we'll test a single proportion against an assumed value. The question of interest is "Do divorced people have more trouble sleeping than married people?" Note that this is a "directional" statement. 'Do divorced people have MORE trouble sleeping than married people?" We can identify the true population proportion of married people that have sleep trouble using dplyr and call this p_null which will represent that assumed value of our population. p_null % filter(!is . na(SleepTrouble), Age > 17, MaritalStatus == "Married") %>% summarize (mean (SleepTrouble=="Yes")) %>% pull () From NHANES, the true proportion of married people who report sleep trouble is 0.248. We would like to know, do divorced people have trouble sleeping at a higher rate? Let p represent the true population parameter for the proportion of divorced people who have trouble sleeping. 1. State your null and alternative hypotheses. Ho : HA : 2. Assume that we're doing this test at the a = 0.05 level. 3. Collect your sample data: Take a sample of 25 divorced people over the age of 17. 4. Calculate your observed statistic The statistic that corresponds to your null hypothesis, which only involves one parameter, is the estimate of the true proportion of divorced individuals having sleep trouble. Call this observed statistic p_hat, which is the proportion of individuals in your sample who respond "Yes" to having sleep trouble. 5. Generate the null distribution Here the null distribution would be the distribution of potential values we might observe for the proportion of divorced people who have sleep trouble, p, IF that true proportion was equal to the population value for married people, which we found in part 1 of this problem as 0.248. So, we don't need to do any bootstrapping here, because we have a value of the TRUTH and we just need to simulate several draws of size 25 from this population. To think of it another way, let's assume we're talking about testing if a coin is fair or not. In this case, our null hypothesis would be No : p = 0.5. Let's say we flip the coin 25 times and get 9 Heads, for an estimate p = 0.36. How odd is this outcome? Well, if we are assuming that the null is true, then if probability of getting Heads on any given flip were 0.5, we can come up with a distribution of the number of Heads we might observe on 25 flips. Most of the time we'll get about 12, sometimes we might get 8 sometimes 20, etc. But, there will be a distribution. This is the same idea in our problem with divorced people and sleep. If the true proportion of divorced people with sleep problems is 0.248, then the distribution of p we might observe will look like:Theoretical Distribution for p = 0.248 1500 - 1000 - 500 - 0.000 0.200 0.248 0.400 0.600 p_hat This distribution is based solely on the assumed value for Ho- To generate a null distribution like this, the only change we make is in the generate () statement, where type = "draw". generate (reps = 5000, type = "draw") We can visualize this null distribution and shade out p-value using the same commands as before. Just remember, when shading the p-value, we're examine a directional hypothesis, so we want to change directions"two-sided" to direction = "greater". 6. Calculate the p-value Same as before, the p-value is still just the probability of observing a summary statistic as or more extreme or more than the observed statistic assuming the null hypothesis is true, but again we're examining a directional hypothesis. So, make sure you change the "direction" accordingly. 7. Compare your p-value to your a level. Is p REJECT THE NULL . p 2 0 -> FAIL TO REJECT THE NULL 4. Do teenagers get the recommended amount of sleep? The National Sleep Foundations recommends that teenagers (14-17 years old) get 8.5 hours of sleep per night. From a sample of size n=35, we will investigate if this demographic in the NHANES population gets the recommended amount of sleep. A. Draw a random sample 35 teenagers (aged 14-17 inclusive). What is the average hours of sleep reported in your sample? What is the standard deviation? You wish to test, at the a = 0.05 level, the claim that the true population average for the number of hours of sleep teenagers in the NHANES population get is equal to 8.5. B. State the Null and Alternative hypotheses for you test. C. Use the infer package to calculate the null distribution for your test. Store this as null_dist, and use the visualize () function to display a histogram of your null distribution, with the p-value shaded in. To generate this null distribution, you'll need: hypothesize (null = "point", mu = ____ ), which tells R that you are testing against a specific value for a mean (and you'll need to enter that specific value in the mu ---- part). For this type of test (a single sample) you'll also use "bootstrap" samples in your generate () command. D. Use the infer package to calculate the p-value. E. Based on your p-value and a level, make a conclusion regarding your null and alternative hypotheses.{r setup, include=FALSE} knitr: :opts_chunk$set(echo = TRUE, eval = TRUE) Library(tidyverse) library(knitr) library(NHANES) library(infer) library(forcats) set . seed(112233)