Question

1 Approved Answer

Posted on May 19, 2024

Histogram for schooling. Make a histogram for schooling with frequency on the yaxis, where the intervals are left closed, right open. Set the breaks to

Histogram for schooling. Make a histogram for schooling with frequency on the yaxis, where the intervals are left closed, right open. Set the breaks to be the values 3, 7, 11, . . . , 23. Make the limits on the y-axis go from 0 to 100. (a) Report your code. (b) Upload your plot to Gradescope. (c) Describe your histogram. State whether it is relatively symmetric, skewed, or neither of these options. State whether it is unimodal, bimodal, or multimodal. Task 8 Boxplot for schooling. Make a vertical boxplot for schooling. Set the y-axis limits to be from 0 to 25. Add the title "Boxplot of Schooling". Are there outliers present? (a) Report your code. (b) Upload your plot to Gradescope. (c) Answer whether there are outliers or not. Task 9 Shapiro-Wilk Test for schooling. Run the Shapiro-Wilk Test for schooling. Does it appear that this variable is normally distributed? (a) Report your code. (b) Provide your test results as a comment within your code. (c) One of the result values that you obtained is a p-value. Assume that if your p-value < 0.02, the data is not normally distributed. Based on what you see, do you think that your data is normally distributed? Why or why not? (d) Does your decision in Part C match what you are seeing with your histogram from Task 7? Why or why not? 6 Task 10 We want to compare the average schooling for the Americas continent to the average schooling for continents that are not Americas. Create a 97.8% confidence interval for Americas Not Americas, assuming equal variances. (a) We need to first split the dataset into two vectors that you can use for analysis. To do this, you may use the below code. You will need to change the name of the dataset to correspond to how you named your dataset in Task 1. americas <- datasetname$schooling[datasetname$continent == "Americas"] NOTamericas <- datasetname$schooling[datasetname$continent != "Americas"] (b) Find the confidence interval requested. Report your code. (c) Provide your results as a comment within your code. (d) State the parameter the confidence interval is for. (e) Write down the confidence interval. (f) Write an interpretation of your confidence interval (We are xx% confident ...). (g) Suppose we are interested in whether there is a difference of 1 year of schooling between the two continent types (Americas Not Americas = 1). Does this value seem plausible (like it could happen)? Why or why not? Your answer should reference your confidence interval. Task 11 Create a 97.3% confidence interval for the proportion of prevalence of thinness of children 5-9 years old (variable thin_child) that are "Low". (a) To help you determine the number of countries whose prevalence of thinness (thin_child) that are "Low", and the total number of countries, copy / paste / run the below code in R. You will need to change the name of the dataset to correspond to how you named your dataset in Task 1. addmargins(table(datasetname$thin_child)) Note: The column called Sum gives you the total number of countries. (b) Check the success / failure condition. Report the expected number of successes and the expected number of failures. Based on this information, can we use the Normal Distribution to approximate the confidence interval? (c) Find the confidence interval by using the large sample option without a continuity correction. Report your code. (d) Provide your results as a comment within your code. (e) State the parameter the confidence interval is for. (f) Write down the confidence interval. 7 Task 12 Create a 90.7% confidence interval for the variance of schooling. (a) Report your code. (b) Provide your results as a comment within your code. (c) State the parameter the confidence interval is for. (d) print down the confidence interval. (e) What assumption did we need to make to be able to construct this confidence interval? Do you think that this assumption is met? You should reference an earlier Task from this project to answer this question. Task 13 One might be concerned that the population mean of life.expectancy is less than 73 years. Conduct a hypothesis test at the 3.1% significance level to determine if this is the case. (a) What condition(s) must you satisfy to perform this hypothesis test? Do you think the condition(s) is(are) met? Why or why not? (b) State the hypotheses. (c) Report your code. (d) Provide your results as a comment within your code. (e) Provide the test statistic value. You must state the specific value. (f) Provide the p-value. You must state the specific value. (g) State your decision (Reject H0 / Do Not Reject H0) based on the p-value and the significance level. (h) State your conclusion in the context of the problem. (i) Suppose you wanted to find the critical region for this test. State the critical value, and the state the critical region. Include all code required to obtain these values. Do NOT use a table or your calculator. Note: When you write your code, you must obtain the exact critical value. You should not have to modify the value after you find it. (j) Using your critical region from above, state your decision (Reject H0 / Do Not Reject H0). Did you make the same decision that you did with your p-value?