Question

1 Approved Answer

Posted on Jun 27, 2024

MAT*H167: Principles of Statistics The attached spreadsheet contains data on the year, experience level, and salary in U.S. dollars for random samples of 79 entry-level

MAT*H167: Principles of Statistics The attached spreadsheet contains data on the year, experience level, and salary in U.S. dollars for random samples of 79 entry-level and 206 mid-level data science jobs from 2020-2022. Suppose that you would like to know if the mean salary for all mid-level data science jobs is greater than the mean salary for all entry-level data science jobs. 1. [4] Describe the two samples selected. 79 entry-level data science jobs from 2020-2022 206 mid-level data science jobs from 2020-2022 2. [4] Describe the two populations of interest. The populations of interest are; 1. All entry-level data science jobs from 2020-2022 in the U.S 2. All mid-level data science jobs from 2020-2022 in the U.S 3. [4] Describe the two parameters of interest. The parameter of interest are; 1. Is the mean salary for all mid-level data science jobs is greater than the mean salary for all entry-level data science jobs 2. Is the mean salary for all mid-level data science jobs is greater than the mean salary for all entry-level data science jobs 4. [10] What null and alternative hypotheses should you test regarding the two parameters of interest? HO = NULL HYPOTHESIS HI = ALTERNATIVE HYPOTHESIS, reflecting what we/ the researcher are looking for an so we do this first. We/ the researcher in this case are looking to find if the mean salary for all mid-level data science jobs is greater than the mean salary for all entry-level data science jobs. So, this would look like this; Mean salary for all mid-level data science jobs in 2020-2022 > mean salary for all entry- level data science jobs in 2020-2022 Ho: m = mean salary for all mid-level data science jobs in 2020-2022 which by the calculations on my excel spreadsheet and data I know the mean salary for mid-level data science jobs to be 88403.17 in 2020-2022. HO: M > 88403.17 5. [4+4] List the four conditions the data must satisfy in order for us to be able to compute a confidence interval AND briefly explain why each condition is satisfied. In order for us to compute a confidence interval we must satisfy four conditions, number one is the sample size random or representative, in this case, it is random for both themid-level and entry-level data jobs in question as the text states so, therefore we satisfy this condition. The second condition to satisfy is the sample size must be over 30 in this case our sample size is 206 mid-level data science jobs and 79 for entry-level data science jobs so both samples are defiantly over 30 and so we satisfy this condition. Are both samples independent, yes both samples are independent and therefore we satisfy this condition. Finally, is the sample less than 10% of the population, in this case, we can also confirm that our sample size is less than 10% of the population as it would be 2060 and 790 which is less than the population and so we have also satisfied this condition. All four conditions have been satisfied and so we can proceed. 6. [5] What is the value of the test statistic? The value of the test statistic is (t) = -3.594798 I obtained this result by using the following formula on my excel spreadsheet. (sample mean-population mean i.e, the null hypothesis)/sample st.dev/(sqrt of sample size)) =3.594798. This means that our sample mean is 88403.17 is nearly 3.594798 st.deviation more than what we would expect if all student's mean salary was greater than the mean salary for mid-level data science jobs from 2020-2022. 7. [5] What is the P-value? The p-value is 0.00567. the p-value is the probability of observing a sample mean as unlikely or more unlikely than the one we computed from the sample collected assuming null is true. We get this value by doing the following calculations on excel attached; =tdist(-3.594798,78,2) = 0.00567 tdist(absvalue,degreesoffreedome, tails)=0.00567 8. [4] Should you reject the null hypothesis in favor of the alternative or fail to reject the null hypothesis in favor of the alternative? Test at the 0.05 significance level. The p-value is 0.00567 which is less than 0.05 significance level and so we can therefore reject the null hypothesis in favor of the alternative. 9. [2+2] Is there significant evidence to support the claim that the mean salary for all mid-level data science jobs is greater than the mean salary for all entry-level data science jobs? Explain your reasoning below. In regard to the question at hand I believe there is enough evidence to support that the mean salary for mid-level data science jobs from 2020-2022 exceeds the mean salary for all entry-level data science jobs. I determined this due to the t-test which was done on the given data and by testing the alternative hypothesis which we know was the mean salary greater than that of the mid-level salary (88403.17). The test allowed a 0.05 significant level and a p-value of 0.000567 which confirmed that the p-value was less than the significance and because of this we now reject the null to confirm there issignificant evidence to support the claim that the mean salary for mid-level data science jobs is greater than the mean salary for all entry-level data science jobs