Question

1 Approved Answer

Posted on Jun 27, 2024

Exercise 2 (16 points) Consider the Wage data set, available by running the following commands (the data set will appear under the name Wage on

Exercise 2 (16 points) Consider the Wage data set, available by running the following commands (the data set will appear under the name "Wage" on the top right hand corner. Make sure you install the package first) # install. packages ("ISLR") library (ISLR) data (Wage) (a.) Consider the health_ins variable. Create a binary variable called insurance using the ifelse function. That variable is equal to 1 if health_ins is Yes", and 0 if it is "No". Compute the sample proportion of observations who have health insurance and call it phat Estimate the standard deviation of phat and call it so_phat. Show all the values. (2 points) Answer: # Answer here (b. ) Test the hypothesis that the true proportion of people who have health insurance, p, is equal to 75% against the alternative that it is different from 75% with a significance level of 10%. Show your computations in R and clearly state your decision to reject or not reject. (2 points) Answer: # Answer here (C. ) Look at the wage variable. Compute its sample average and standard deviation. Using the sampling distribution of the sample average, compute the probability that the average wage is bigger than $112 (teh sample is considered big enough). Show your computations in R. (2 points) Answer: # Answer here (d.) Test the hypothesis that the true average wage is equal to $110 against the alternative that it is bigger than $110 with a 2.5% significance level. Show the test statistic or p-value. (2 points) Answer: # Answer here(e.) Construct a 95% confidence interval for the true average wage. Is the value $110 included? Relate your answer to your decision in the previous question. (2 points) Answer: # Answer here (f. ) We are interested in the difference in average wages across the jobclass variable. Compute the sample average of the wage variable for Industrial jobs (call it whar_industrial) and for Information jobs (call it wbar_information). Estimate the variance of the sample averages as well (call them w_bar_var_industrial and w_bar_var_industrial respectively) and show them all. The filter command form the dp/yr package will be useful (the ap/yr package is included in the tidyverse package, so there is no need to install it separately. In the code chunk, type dplyr::filter if you want to use filter. It makes sure the filter command from the dplyr package is uses, as opposed to the filter command from another package), as it allows to extract observations where a variable has a specific value. You can extract two data sets according to whether jobclass is industrial or information. (3 points) Answer: # Answer here (g.) We want to test the hypothesis that the difference in wages between the two job classes is equal to 0, against the alternative hypothesis that this difference is not equal to 0. The significance level is 5%. Show your computations in R (show either the test statistic or the p-value) and clearly state your decision to reject or not reject. (3 points) Answer: # Answer here