Answered step by step
Verified Expert Solution
Link Copied!

Question

...
1 Approved Answer

Hypothesis Testing for Relationships Preparation Tasks A. i. Load the datafile my_glss6.rdata ii. Load the descr package iii. Load the UsingR package (you may need

Hypothesis Testing for Relationships Preparation Tasks A. i. Load the datafile my_glss6.rdata ii. Load the descr package iii. Load the UsingR package (you may need to install it first). The UsingR package contains real datasets useful for teaching and learning statistics. C. Below is a codebook for the my_glss6 dataframe: Nationally representative sample of Ghanaian households surveyed in 2012 to 2013 RELIGION Religion - factor with 9 levels HHEDLEV Household head education level - factor with 9 levels HHSIZE Size of household - numerical ELECTCON Household connected to electricity grid - factor with 2 levels EDLEV2 Collapsed variable for household head education level - factor with 4 levels: 0 = no formal education 1 = completed lower secondary or less 2 = completed secondary or technical school 3 = completed university RELIG2 Collapsed variable for religion - factor with 4 levels 0 = Islam 1 = Christian 2 = Pentecostal/Charismatic 3 = No Religion Note: Traditional and other categories were too small, n < 30, so converted to NA UNIV Collapsed variable for household head completed university or not 0 = household head did not complete university 1 = household head completed university or higher Homework Questions & Instructions For the question below, use R code chunks to show all calculations, data analysis, and graphics. Then add text below the code chunk for appropriate communication of steps and the results. For hypothesis tests, you need to include text for each of the four steps in the process of formal hypothesis testing. 1. In this question, you will use a subset of the Ghana Living Standards Survey Round 6 (GLSS6) data to answer the question: "In Ghana, is there an association between size of a household and the level of education of the head of household?" (a) State explanatory and response variable for the association, the role-type classification, and the name of the appropriate statistical test. (b) Generate a bivariate plot to visualize the association (use EDLEV2, the collapsed variable for level of education of the household head). Make sure the plot type is correct for the role-type classification stated in part (a). Give appropriate category and y-axis labels and a main title. Give a written description of the pattern observed in the graph. (c) Go through the four steps in the hypothesis test for the association between the level of education of the household head and household size. Write in text below the code chunk the necessary information and conclusions for each step. Don't forget to mention which type of error could have been made. (d) Conduct a PostHoc test. What can you conclude? 2. In this question, you will continue with same GLSS6 subset to explore the question "In Ghana, is the religion of the household head associated with whether or not they completed university?" (a) State explanatory and response variable for the association, the role-type classification, and the name of the appropriate statistical test. (b) Generate a bivariate plot to visualize the association. (Hint: the x-axis should display the categories of the explanatory variable (RELIG2) and the y-axis should show the proportion who completed university (use UNIV)). Give appropriate category and y-axis labels and a main title. Give a written description of the pattern observed in the graph. (c) Go through the four steps in the hypothesis test for the association between the level of education of the household head and household size. Write in text below the code chunk the necessary information and conclusions for each step. Don't forget to mention which type of error could have been made. (d) Conduct a postHoc test. What can you conclude? (e) Discuss possible lurking variables in this association. 3. According to World Bank Data posted on the data consolidation webste Quandl, the household electrification rate in Ghana in 2008 was 60.50% (https://www.quandl.com/collections/society/household-electrification-rate-worldbankby-country). Based on the GLSS 6 survey completed in 2012/13, has the household electrification rate changed since 2008? (a) Use variable ELECTCON in the GLSS6 dataset to test the claim that the proportion of households connected to the electricity grid has changed since 2008. (b) Calculate the 95% confidence interval estimate of the proportion of Ghanaian households connected to the electricity grid in 2012-13. (c) Explain how the confidence interval supports the results of the hypothesis test in part (a). 4. What if the only data you have is summarized in a contingency table? A contingency table displays frequencies correspond to two variables, so it is a form of bivariate output. Articles in scientific journals, technical reports and websites often present data in contingency tables. In this activity, you will conduct a statistical test on categorical data summarized in a contingency table. The chisq.test() function in R can accept raw data (this is what we have done so far) or data summarized in a table of the type generated by the table() function. So, when working with summarized data, the first step is to create the table, and then use chisq.test() to determine if the two categorical variables are associated. Below is an example of how to create such a table in R. The function that will create the table is: array(vector of counts, vector of row-column dimensions, list(vector of row names, vector of column names)). Consider the sample table below: Pass Fail Group A 10 14 Group B 417 145 # first define the needed vectors counts <- c(10,427,14,145) # vector of counts: enter counts by first column, second column dim <- c(2,2) # vector of table dimensions: 2 rows x 2 columns in this case groups <- c("Group A", "Group B") # vector of row labels result <- c("Pass", "Fail") # vector of column labels # now define the table tbl <- array(counts, dim, list(groups, result)); tbl #Run the chi-square test of independence chi_results <- chisq.test(tbl) chi_results You are a hospital administrator thinking about ways to cut costs and increase capacity (e.g. number of patients at the hospital). In Ghana, it is typical for women and their infants to remain in the hospital for 1 week following normal childbirth (in other words a birth with no complications). However, in many other countries, women and newborns are typically discharged from the hospital after 48 hours, and in the United States the norm is only 24 hours in the case of a normal birth with no complications. You decide to conduct a review of the literature on this topic in order to inform a potential change in the policy on discharge times following normal childbirth at the hospital. (a) Below is a table presented in one of the articles you read on the topic. At the 0.05 significance level, test the claim that whether the newborn was discharged early or late is independent of whether the newborn was re-hospitalized within a week of discharge. Rehospitalization within Week of Discharge Early Discharge (less than 30 hours) Late Discharge (30 to 78 hours) Yes 622 361 No 3997 4660 Based on data from "The Safety of Newborn Early Discharge,"by Liu and others. Journal of the American Medical Association, Vol. 278, No.4 (b) Does the conclusion change if the level of significance is changed to 0.01? (c) How would the results of the test inform your decision regarding a new policy on discharge times? 5. Over the past year, the human resource manager at the hospital has run a series of three-month workshops aimed at increasing worker motivation and performance. To check the effectiveness of the workshops, she selected a random sample of 35 employees from the personnel files and recorded their most recent annual performance ratings, along with their ratings prior to attending the workshops. The data is given in the table below. Before After Before After 59 72 80 76 72 74 70 80 89 62 76 79 67 74 78 88 81 78 77 83 88 86 74 83 71 81 63 81 67 72 62 76 78 77 84 79 64 85 71 81 72 80 68 86 89 80 88 89 87 76 73 75 69 86 77 71 61 84 83 78 82 80 82 78 82 87 60 94 65 82 (a) State explanatory and response variable for the association, the role-type classification, and the name of the appropriate statistical test. (b) Use the formal process of hypothesis testing to determine if the workshops for employees at the hospital appear to be improving performance. (c) Calculate 95% confidence intervals for the ratings before and after the workshops. (d) Explain how the confidence intervals support the conclusion from part (b). 6. This activity will use the babies dataset in the UsingR library to investigate the association between gestation time and birth weight moderated by maternal smoking. In the console type ?babies to see a description of the dataset. (don't put ?babies in the RMarkdown file!). (a) Data management: i. Create a subset of variables, called my_babies, containing the variables gestation, wt, and smoke. ii. Generate a simple scatterplot of the relationship between gestation time and birth weight. Use plot(my_babies$gestation, my_babies$wt). Note that error codes are clearly visible in the plot. Recode gestation and wt so that the error code 999 is assigned to NA. iii. Create a secondary variable, called my_babies$sm, that is assigned 1 if smoke == 1 (smokes now) and is assigned 0 otherwise. Tell R that the codes are factors: my_babies$sm <- factor(my_babies$sm). (b) Data Visualization: i. Generate a scatterplot of birth weight as a function of gestation for mothers who did not smoke during pregnancy: plot(my_babies$wt[my_babies$sm==0] ~ my_babies$gestation[my_babies$sm==0], main="title",xlab="label", ylab="label",col="color1") ii. Add a regression line: abline(lm(my_babies$wt[my_babies$sm==0] ~ my_babies$gestation[my_babies$sm==0]),lwd=2, col="color1") iii. Now add the points and a second regression line for mothers who did smoked during pregnancy: points(my_babies$wt[my_babies$sm==1] ~ my_babies$gestation[my_babies$sm==1], col="color2") abline(lm(my_babies$wt[my_babies$sm==1] ~ my_babies$gestation[my_babies$sm==1]),lwd=2, col="color2") iv. And finally, add a legend: legend(150,175,legend=c("category1 text","category2 text"),fill=c("color1", "color2"),cex=0.75) (c) Conduct hypothesis tests for each smoking factor separately to determine if there is a linear relationship and the strength of the linear relationship in each case. Document each step in the hypothesis test. The code for calculating the test statistic and p-value for the first smoking category is: cor_results<-cor.test(my_babies$wt[my_babies$sm==1], my_babies$gestation[my_babies$sm==1]) cor_results (d) Summarize the results of the research.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Corporate Finance and Investment decisions and strategies

Authors: Richard Pike, Bill Neale, Philip Linsley

8th edition

978-1292064062

Students also viewed these Mathematics questions

Question

Draw a schematic diagram of I.C. engines and name the parts.

Answered: 1 week ago