Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In 2009, a large Midwestern University wanted to give a report to the state's board of regents to justify the continued expenditures for their study
In 2009, a large Midwestern University wanted to give a report to the state's board of regents to justify the continued expenditures for their study abroad programs. In particular, they wanted to show that students who studied abroad had better language proficiency than their peers who did not study abroad. Every entering student is required to take a language proficiency/placement exam at the beginning of their first year. Four years later, the study abroad office took a random sample of 1000 students who had completed beyond the 3 semesters minimum required for their general education credits and administered an additional proficiency exam to those students. Of the 1000 students, 221 students had completed one or more semesters abroad, whereas 779 had not done no study abroad. Of the 221 who studied abroad, 193 were found to be proficient or above in their language skills, whereas 534 of the students who had not studied abroad demonstrated proficient or above language skills. a) (3 points) Can this study be considered an experiment? Why or why not? This study is not an experiment. Because an experiment has random assignment and a researcher manipulates the Independent Variable, allows researcher to infer causation. There is not manipulation in this study. This would be an observation. b) (8 points) Let p1 represent the proportion of the proficient students who studied abroad. Let p2 represent the proportion of proficient students who did not study abroad. Find the values for each proportion, then list the additional information necessary to construct a 90 percent confidence interval for p1 - p2 , including conditions and the formula. Then compute the confidence interval. 193/221=0.8733 534/779=0.6855 Standard error = .02788 ME = .04589 L limit = 0.14195 U Limit = 0.23367 Z score = 5.5314 c) (3 points) Using the confidence interval you reported in part (a), can you reject the null hypothesis that the proportion of students who are language proficient with study abroad is the same as the proportion of students who are language proficient without study abroad? Explain. The p value is <.0001 so with this information we can reject the null hypothesis. The standard error are not within the upper and lower limits we conclude that there is significant evidence that there is a difference between the study abroad and those who do not. d) (2 points) Give at least one good reason why it would have been better if the number of students in the study who had studied abroad would have been equal to the number of students who had not studied abroad. 1 It would have been better for the two groups to be equal because having two different sizes causes more variance. It also can cause it to be skewed and sometimes violate normality. Question 2 (24 points) A research laboratory is developing new compounds to provide relief from a specific allergy. In an experiment with 60 volunteer subjects who suffered from the allergy, the amounts of two active ingredients (ingredient A and ingredient B) were varied. Two levels (0.2g and 0.4g) were used for ingredient A and three levels (1g, 2g and 3g) were used for ingredient B. Consequently, six different compounds were made from the 2 3=6 combinations of levels for these two ingredients. The 60 subjects were randomly assigned to the six compounds so that 10 subjects were assigned to each of the six compounds. Each subject took a pill containing the assigned compound and the number of hours of relief from allergy symptoms was recorded for each subject. a) (3 points) Identify the experimental units (the who), the treatments, and the response that was measured on each subject. Individuals with suffered from the allergy. Treatment were 6 levels of compounds containing different dosages of ingredient A and ingredient B. The response number of hours of relief from allergy symptoms. b) (2 points) Randomization was used in this experiment. What is the reason (or reasons) for randomly assigning subjects to treatments? The reason for randomly assigning subjects to treatments is so it eliminates the possibility for any biases. c) (6 points) Considering the experimental design, list at least three scientific questions which the national chain would want to answer from the data generated by this study. For each question, write an appropriate null hypothesis and an appropriate alternative hypothesis. Does higher dosage of A and B give more relief than lower dosages? Does Ingredient B offer more relief than ingredient A? Does Ingredient A offer more relief than ingredient B? H o=0 H A 0 H o=a1+ a2 +a3 =0 H A a1 +a2 +a 3 0 H o= 1 + 2+ 3 =0 H A 1 + 2 + 3 0 d) (8 points) Outline an analysis of variance that would be useful for testing some or all of the hypotheses you presented in part (c). You should have one line in your ANOVA table for each source of variation and you should report the value for the corresponding degrees of freedom. Using the outline of the ANOVA table, describe how you would test some or all of the hypotheses you presented in part (c). (Note that you do not have any data, so you cannot compute numerical values for sums of squares, mean squares, or test statistics.) Source of SS Df MS F P variation Ingriedient A 1 2 Ingredient B Ingredient A x B Within Total 2 5 54 62 e) (3 points) Although blocking was not done in this experiment, describe the potential benefit of using blocking in an experiment. Identify a potential blocking factor for this experiment, explain why it would be a good blocking factor, and describe how the experiment could be performed as a randomized block design. By blocking you can isolate the variability attributive to the difference between blocks, so you can see the difference caused by the treatment more clearly. IT also helps to avoid nuisance factors. A potential blocking factor one could use is blocking all the levels with ingredient A at 0.2g, and block all the levels with ingredient A at 0.4g. by doing this one can see if there is a significant difference between the two different levels of ingredient A. f) (2 points) Describe what would need to be done to make this a double blind experiment. To make this a double blind experiment the person receiving the treatment wouldn't know what level of treatment they are receiving and the person administering the treatment wouldn't know what treatment the participant is receiving either. Question 3: (12 points) A random poll of voting age citizens in Montana was conducted to gauge the current partisan make up of the state to prepare for the upcoming primary election. Participants were asked to identify their gender and party affiliation. Gender Men Women Democrat 36 48 Independent Republican 24 45 16 33 Do these data suggest that there are significant differences in the distribution of partisan affiliations for men and women? Perform an appropriate test to address this question. Your response should include: i) ii) iii) A precise statement of the null hypothesis and the alternative hypothesis. H O= 12=0 H A 12 0 Checks of the conditions for inference. Randomization is met. The participants are from a random poll of voting age citizens. Normality is met because it has a large sample, n=202. 10 % is met its less than 10% of the entire population. 3 iv) The formula and value of your test statistic, relevant degrees of freedom, and a p-value. v) A clear statement of your conclusion in the context of this study. Question 4: (48 points) As part of a study of student performance at a large university, data were collected on a random sample of freshman computer science majors. Of particular interest was the cumulative grade point average (GPA) at the end of each student's first three semesters at the university. Other information recorded on each student at the time the student enrolled at the university includes average high school grades in mathematics (HSM), average high school grades in science (HSS), and average high school grades in English and communication courses (HSE). Researchers at the university were interested in predicting the GPA's for computer science majors at the end of first three semesters of enrollment from the information on high school grades. In this data set, high school grades were coded on a scale from 1 to 10, with 10 corresponding to an A, 9 to a A-, 8 to a B+, etc. At this university, GPA's are recorded on a scale from 0 to 6, with 6 corresponding to a straight A performance. Results for 224 computer science majors were included in this study; there were 145 men and 79 women. a) The researchers wanted to know if there was a significant difference between the average GPA's at the end of three semesters of study for men and women computer science majors. They created the following box plot and compiled the following summary statistics. 4 Gender Men Women GPA Summary Statistics Standard Number Mean Deviation 145 4.6077 0.8068 79 4.6857 0.7288 95% Confidence Intervals Lower Upper 4.5225 4.8489 4.4753 4.7402 i) (4 points) What information is provided by the side-by-side box plots? Do conditions for inference appear to be satisfied? The box plots show us that the center of distribution is about the same, the median is about the same, variation is about the same and the IQR are about the same. It shows us that could be a little skewed to the left being its towards the top of the whisker. So it shows us we have normal distribution. And our normality is met. ii) (2 points) State the null hypothesis and the alternative hypothesis that the researchers should use to answer their question. 12=0 12 0 iii) (2 points) Explain why a two sample t-test should be used in this situation instead of a paired t-test. A two sample t test should be used in this instance because of several factors that could be part of this study. Some examples would be some students entering into college not taking computer classes before or all of them taking different classes from different schools. Another factor to take into consideration is students taking different classes each semester so one is not comparing the same class load with each other. Another reason is because of different levels of education when entering into the school. iv) (6 points) Perform the t-test. Report the value the test statistic, its degrees of freedom, and report a p-value. State your conclusion in the context of this study. T-stat = -0.7148512 Df = 222 p-Value = 0.4755 Stndard error = 0.10911361 v) (4 points) Explain why checking if the 95% confidence interval for the mean GPA for men overlaps with the 95% confidence interval for the mean GPA for women is not an appropriate method for testing the null hypothesis you stated in part (ii). b) (4 points) As a first step in examining the relationship between GPA after the first three semesters at the university and average high school grades in mathematics (HSM), science (HSS) and English (HSE), the researchers computed the following correlation results and the following scatterplot matrix. GPA HSM HSS GPA 1.0000 0.4365 0.3294 Correlations HSM 0.4365 1.0000 0.5757 5 HSS 0.3294 0.5757 1.0000 HSE 0.2890 0.4469 0.5794 HSE Variable HSM HSS HSS HSE HSE HSE Correlations GPA 0.2890 by Variable GPA GPA HSM GPA HSM HSS HSM 0.4469 HSS 0.5794 Pairwise Correlations Correlation 0.4365 0.3294 0.5757 0.2890 0.4469 0.5794 Count 224 224 224 224 224 224 Lower 95% 0.3240 0.2073 0.4809 0.1641 0.3355 0.4851 HSE 1.0000 Upper 95% 0.5369 0.4414 0.6572 0.4048 0.5460 0.6603 Signif Prob <.0001* <.0001* <.0001* <.0001* <.0001* <.0001* Summarize what these results tell you about relationships between the four variables, GPA, HSM, HSS, and HSE. c) The following results (produced by JMP) are from the regression of GPA on HSM, HSS, and HSE. Analysis of Variance Source Model Error C. Total DF 3 220 223 Sum of Squares 27.71233 107.75046 135.46279 Mean Square 9.23744 0.48977 6 F Ratio 18.8606 Prob > F <.0001 R-Square 0.2046 Parameter Estimates Term Intercept HSM HSS HSE i) (2 points) Interpret the Estimate 2.5898766 0.1685666 0.0343156 0.0451018 R 2 Std Error 0.294243 0.035492 0.037559 0.038696 t Ratio 8.80 4.75 0.91 1.17 Prob>|t| <.0001 <.0001 0.3619 0.2451 value. ii) (6 points) Write out the formula for the equation for predicting GPA's for computer science students at the end of the first three semesters of enrollment from high school academic performance as summarized by the HSM, HSS, and HSE variables. Interpret the estimates of the regression coefficients for this model. iii) (4 points) The coefficients for HSS and HSE are not statistically significant for this model. Does this imply that neither HHS nor HSE provide any information for predicting GPA? Should both HSS and HSE be deleted from the model? Explain. d) (4 points) Two of the partial residual (leverage) plots produced by JMP are shown below. Summarize the information in these plots. e) (4 points) A plot of the residuals versus the predicted values is shown below. Summarize the information this plot provides about how well the model describes the data and if conditions for inference are well satisfied. 7 f) (6 points) A new student, Jane, will enter the university as a computer science major in Fall 2017. Jane has average scores of 7, 7, and 9 in her high school classes on mathematics, science, and English, respectively. The model predicts that her GPA at the end of three semesters at the university will be 4.42. Show how this prediction is obtained by inserting appropriate values into the prediction equation you reported in part (c) of this problem. The standard error of this GPA estimate (mean estimate) is 0.085. Show how to construct an interval such that you would have 95 percent confidence that the interval will contain Jane's GPA at the end of her first three semesters at the university. Question 5: (40 points) For the study described in problem 4, the researchers also collected data on each student's score on the quantitative part of the SAT exam (SATM) and the verbal part of the SAT exam (SATV). These two SAT scores are included with the HSM, HSS, and HSE scores and the GPA at the end of three semesters in the data file posted as male_gpa.csv, for the 145 male computer science majors in the study. Use these data to build a good prediction equation for GPA at the end of the three semesters for male computer science majors. Your report should include the following parts. (a) (12 points) Briefly report on each of the steps you took to develop a good prediction model. There is no need to include all of the details for each step of your investigation, but you should write one or two sentences describing what you learned from each step of your investigation. If you think it is important to include a graphical display or table to make your point please include the graph or table in your report. 8 (b) (12 points) Report the R2 value, the ANOVA table, and a table of parameters estimates for the prediction model you think is best. Interpret the parameters in your model. You may have found more than one good model in part (a), and you can comment on those models, but you only to need to talk about the results for one of those models in this part. (c) (12 points) Use graphical displays and any other diagnostic methods covered in this class to assess how well you model fits the data and to assess if there are any problems. Comment on what you discover. (d) (4 points) Someone suggests that you could assess how well the model you reported in part (b) performs by applying it to the data for the 79 women in the study, because the data for these women were not used to fit the model you reported in part (b). You could use the model you presented in part (b) to predict the GPA for each of the 79 women and then compare those predictions to the actual GPAs achieved by those women. Do you think this is a good suggestion? Explain. 9 Hello I have gone through the work. Corrected where necessary and completed the rest of the work. However the last question requires data. Kindly if you could provide the data I could appreciate as this will assist me in satisfying you by providing comprehensive and quality work. Its a pleasure to work with you. Thank you In 2009, a large Midwestern University wanted to give a report to the state's board of regents to justify the continued expenditures for their study abroad programs. In particular, they wanted to show that students who studied abroad had better language proficiency than their peers who did not study abroad. Every entering student is required to take a language proficiency/placement exam at the beginning of their first year. Four years later, the study abroad office took a random sample of 1000 students who had completed beyond the 3 semesters minimum required for their general education credits and administered an additional proficiency exam to those students. Of the 1000 students, 221 students had completed one or more semesters abroad, whereas 779 had not done no study abroad. Of the 221 who studied abroad, 193 were found to be proficient or above in their language skills, whereas 534 of the students who had not studied abroad demonstrated proficient or above language skills. a) (3 points) Can this study be considered an experiment? Why or why not? This study is not an experiment. Because an experiment has random assignment and a researcher manipulates the Independent Variable, allows researcher to infer causation. There is not manipulation in this study. This would be an observation. b) (8 points) Let p1 represent the proportion of the proficient students who studied abroad. Let p2 represent the proportion of proficient students who did not study abroad. Find the values for each proportion, then list the additional information necessary to construct a 90 percent confidence interval for p1 - p2 , including conditions and the formula. Then compute the confidence interval. 193/221=0.8733 534/779=0.6855 CI= (P1-P2) Z*SE Standard error = .02788 ME = .04589 Z score = 1.645 =CI=(0.8733-0.6855)*1.645*0.02788 =0.18780.04589 L limit = 0.14195 U Limit = 0.23367 c) (3 points) Using the confidence interval you reported in part (a), can you reject the null hypothesis that the proportion of students who are language proficient with study abroad is the same as the proportion of students who are language proficient without study abroad? Explain. The p value is <0.1 level of alpha. There is therefore high probability of making type 1 error. so with this information we can reject the null hypothesis. The standard error is not 1 within the upper and lower limits we conclude that there is significant evidence that there is a difference between the study abroad and those who do not. d) (2 points) Give at least one good reason why it would have been better if the number of students in the study who had studied abroad would have been equal to the number of students who had not studied abroad. It would have been better for the two groups to be equal because having two different sizes causes more variance. It also can cause it to be skewed and sometimes violate normality. Sample size has much influence to standard deviation and the size of error. Therefore when one sample has a larger size than the other there is variation in the standard error as well as the size of error. As sample size increases error is minimized. Question 2 (24 points) A research laboratory is developing new compounds to provide relief from a specific allergy. In an experiment with 60 volunteer subjects who suffered from the allergy, the amounts of two active ingredients (ingredient A and ingredient B) were varied. Two levels (0.2g and 0.4g) were used for ingredient A and three levels (1g, 2g and 3g) were used for ingredient B. Consequently, six different compounds were made from the 2 3=6 combinations of levels for these two ingredients. The 60 subjects were randomly assigned to the six compounds so that 10 subjects were assigned to each of the six compounds. Each subject took a pill containing the assigned compound and the number of hours of relief from allergy symptoms was recorded for each subject. a) (3 points) Identify the experimental units (the who), the treatments, and the response that was measured on each subject. Individuals with suffered from the allergy. Treatment was 6 levels of compounds containing different dosages of ingredient A and ingredient B. The response number of hours of relief from allergy symptoms. Experimental units influence the response units in this case the 6 levels of compounds containing different dosages are the treatment units influencing number of hours of relief from allergy symptoms. b) (2 points) Randomization was used in this experiment. What is the reason (or reasons) for randomly assigning subjects to treatments? The reason for randomly assigning subjects to treatments is so it eliminates the possibility for any biases. Biasness could result from prejudice of purposive sampling and some other non-probabilistic sampling. Random sampling is the best in eliminating biasness. c) (6 points) Considering the experimental design, list at least three scientific questions which the national chain would want to answer from the data generated by this study. For each question, write an appropriate null hypothesis and an appropriate alternative hypothesis. Does higher dosage of A and B give more relief than lower dosages? Does Ingredient B offer more relief than ingredient A? Does Ingredient A offer more relief than ingredient B? H o=0 H A 0 H o=a1+ a2 +a3 =0 H A a1 +a2 +a 3 0 H o= 1 + 2+ 3 =0 H A 1 + 2 + 3 0 2 d) (8 points) Outline an analysis of variance that would be useful for testing some or all of the hypotheses you presented in part (c). You should have one line in your ANOVA table for each source of variation and you should report the value for the corresponding degrees of freedom. Using the outline of the ANOVA table, describe how you would test some or all of the hypotheses you presented in part (c). (Note that you do not have any data, so you cannot compute numerical values for sums of squares, mean squares, or test statistics.) Source of SS Df MS F P variation Ingriedient A 1 Ingredient B 2 Ingredient A x B 5 Within 54 Total 62 e) (3 points) Although blocking was not done in this experiment, describe the potential benefit of using blocking in an experiment. Identify a potential blocking factor for this experiment, explain why it would be a good blocking factor, and describe how the experiment could be performed as a randomized block design. By blocking you can isolate the variability attributive to the difference between blocks, so you can see the difference caused by the treatment more clearly. It also helps to avoid nuisance factors. A potential blocking factor one could use is blocking all the levels with ingredient A at 0.2g, and blocks all the levels with ingredient A at 0.4g. by doing this one can see if there is a significant difference between the two different levels of ingredient A. f) (2 points) Describe what would need to be done to make this a double blind experiment. To make this a double blind experiment the person receiving the treatment wouldn't know what level of treatment they are receiving and the person administering the treatment wouldn't know what treatment the participant is receiving either. Question 3: (12 points) 3 A random poll of voting age citizens in Montana was conducted to gauge the current partisan make up of the state to prepare for the upcoming primary election. Participants were asked to identify their gender and party affiliation. Gender Men Women Democrat 36 48 Independent Republican 24 45 16 33 Do these data suggest that there are significant differences in the distribution of partisan affiliations for men and women? Perform an appropriate test to address this question. Your response should include: i) ii) A precise statement of the null hypothesis and the alternative hypothesis. H 0:there is no associationbetween gender partisan affiliation H A :there is association between gender partisan affiliation 0 Checks of the conditions for inference. Randomization is met. The participants are from a random poll of voting age citizens. Normality is met because it has a large sample, n=202. 10 % is met its less than 10% of the entire population. iii) The formula and value of your test statistic, relevant degrees of freedom, and a pvalue. gender democrats independent republican total men 36 24 45 105 women 48 16 33 97 total 84 40 78 202 We wish to calculate chi square for test of association between gender and political affiliations. A11= 84*105/202=43.66 A21=84*97/202=40.3366 A12=40*105/202=20.792 A22=40*97/202=19.2079 A13=78*105/202=40.5445 A33=78*97/202=37.4554 Chi square is given by (o-e)2/e =(36-43.66)2/43.66 +(48-40.3366)2/40.3366 +(24-20.792)2/20.792 +(16-19.2079)2/19.2079 +(45-40.5445)2/40.5445 +(33-37.4554)2/37.4554 =1.3439+1.4559+0.4949+0.5357+0.4896+0.5299 =4.84998 iv) A clear statement of your conclusion in the context of this study. 4 Degrees of freedom= (3-1)*(2-1) =2*1=2 Critical value at 2 degrees of freedom and 0.05 level of significance is equal to =5.99 Calculated value is less than critical value we therefore do not reject null hypotheses. The p-value for the test is 0.08848 This is greater than 0.05. We therefore affirm the decision not to reject null hypotheses. We therefore conclude that there was no association between gender and party affiliation and hence no difference between men and women. Question 4: (48 points) As part of a study of student performance at a large university, data were collected on a random sample of freshman computer science majors. Of particular interest was the cumulative grade point average (GPA) at the end of each student's first three semesters at the university. Other information recorded on each student at the time the student enrolled at the university includes average high school grades in mathematics (HSM), average high school grades in science (HSS), and average high school grades in English and communication courses (HSE). Researchers at the university were interested in predicting the GPA's for computer science majors at the end of first three semesters of enrollment from the information on high school grades. In this data set, high school grades were coded on a scale from 1 to 10, with 10 corresponding to an A, 9 to a A-, 8 to a B+, etc. At this university, GPA's are recorded on a scale from 0 to 6, with 6 corresponding to a straight A performance. Results for 224 computer science majors were included in this study; there were 145 men and 79 women. a) The researchers wanted to know if there was a significant difference between the average GPA's at the end of three semesters of study for men and women computer science majors. They created the following box plot and compiled the following summary statistics. Gender Men Women GPA Summary Statistics Standard Number Mean Deviation 145 4.6077 0.8068 79 4.6857 0.7288 5 95% Confidence Intervals Lower Upper 4.5225 4.8489 4.4753 4.7402 i) (4 points) What information is provided by the side-by-side box plots? Do conditions for inference appear to be satisfied? The box plots show us that the center of distribution is about the same, the median is about the same, variation is about the same and the IQR are about the same. It shows us that could be a little skewed to the left being its towards the top of the whisker. So it shows us we have normal distribution. And our normality is met. ii) (2 points) State the null hypothesis and the alternative hypothesis that the researchers should use to answer their question. 12=0 12 0 iii) (2 points) Explain why a two sample t-test should be used in this situation instead of a paired t-test. A two sample t test should be used in this instance because of several factors that could be part of this study. Some examples would be some students entering into college not taking computer classes before or all of them taking different classes from different schools. Another factor to take into consideration is students taking different classes each semester so one is not comparing the same class load with each other. Another reason is because of different levels of education when entering into the school. iv) (6 points) Perform the t-test. Report the value the test statistic, its degrees of freedom, and report a p-value. State your conclusion in the context of this study. Difference = (4.6077-4.6857) =-0.078 Standard error for the difference between means = 0.10911361 T-stat = (mean 1- mean 2)/SE -0.078/0.10911361 = -0.7148512 Df = 222 P-Value = 0.4755 The p-value is more than 0.05 level of significance we therefore do not reject null hypotheses. We therefore conclude that there is no difference GPA's at the end of three semesters of study for men and women computer science majors. v) (4 points) Explain why checking if the 95% confidence interval for the mean GPA for men overlaps with the 95% confidence interval for the mean GPA for women is not an appropriate method for testing the null hypothesis you stated in part (ii). The sample sizes were different and this will have an influence on the confidence interval. The larger the sample size the narrower is the confidence interval. Therefore checking for the overlapping of the confidence intervals will be inappropriate for testing. 6 b) (4 points) As a first step in examining the relationship between GPA after the first three semesters at the university and average high school grades in mathematics (HSM), science (HSS) and English (HSE), the researchers computed the following correlation results and the following scatterplot matrix. GPA HSM HSS HSE Variable HSM HSS HSS HSE HSE HSE GPA 1.0000 0.4365 0.3294 0.2890 by Variable GPA GPA HSM GPA HSM HSS Correlations HSM 0.4365 1.0000 0.5757 0.4469 HSS 0.3294 0.5757 1.0000 0.5794 Pairwise Correlations Correlation 0.4365 0.3294 0.5757 0.2890 0.4469 0.5794 Count 224 224 224 224 224 224 Lower 95% 0.3240 0.2073 0.4809 0.1641 0.3355 0.4851 HSE 0.2890 0.4469 0.5794 1.0000 Upper 95% 0.5369 0.4414 0.6572 0.4048 0.5460 0.6603 Signif Prob <.0001* <.0001* <.0001* <.0001* <.0001* <.0001* Summarize what these results tell you about relationships between the four variables, GPA, HSM, HSS, and HSE. 7 All the variables are positively correlated. This means that as one variable increases the other variable also increases. However most of the variables had a weak and moderate positive linear relationship. For instance, mathematics and GPA had a moderate relationship indicated by the correlation coefficient of 0.4365 home science and GPA weak relationship indicated by a coefficient of 0.3294. However mathematics and GPA had a much improved positive relationship of shown by a correlation coefficient above 0.5. English and GPA had a weak relationship too with a correlation coefficient of 0.2890. English and mathematics had a moderate positive relationship of 0.4469 while English and home science was slightly strongly related with a correlation coefficient above 0.5. All the coefficients were significant with p-values less than alpha. c) The following results (produced by JMP) are from the regression of GPA on HSM, HSS, and HSE. Analysis of Variance Source Model Error C. Total DF 3 220 223 Sum of Squares 27.71233 107.75046 135.46279 Term Intercept HSM HSS HSE Estimate 2.5898766 0.1685666 0.0343156 0.0451018 Mean Square 9.23744 0.48977 F Ratio 18.8606 Prob > F <.0001 R-Square 0.2046 Parameter Estimates i) (2 points) Interpret the Std Error 0.294243 0.035492 0.037559 0.038696 t Ratio 8.80 4.75 0.91 1.17 Prob>|t| <.0001 <.0001 0.3619 0.2451 R2 value. 0.2046 means that the 20.46 of the GPA is influenced by the model or otherwise determined by the independent variables which are Mathematics home science and English. ii) (6 points) Write out the formula for the equation for predicting GPA's for computer science students at the end of the first three semesters of enrollment from high school academic performance as summarized by the HSM, HSS, and HSE variables. Interpret the estimates of the regression coefficients for this model. GPA=2.5898766+0.1685666 HSM+0.0343154 HSS+0.0451018 HSE The first coefficient which is 2.5898766 is the constant value of GPA that is not influenced by the independent variables. 0.1685666 is the amount of change in GPA as a result of unit change in mathematics. While 0.0343154 is the amount of change in GPA as a result of a unit change in Home science. 0.0451018 is the amount of change in GPA due to a unit change in English. 8 iii) (4 points) The coefficients for HSS and HSE are not statistically significant for this model. Does this imply that neither HHS nor HSE provide any information for predicting GPA? Should both HSS and HSE be deleted from the model? Explain. This does not mean that because the coefficients for home science and English are not statistically significant does not provide any information in predicting GPA. However this indicates that the data collected could not provide enough evidence on how important the variables were in predicting GPA. Therefore the variables should be deleted from the model d) (4 points) Two of the partial residual (leverage) plots produced by JMP are shown below. Summarize the information in these plots. e) (4 points) A plot of the residuals versus the predicted values is shown below. Summarize the information this plot provides about how well the model describes the data and if conditions for inference are well satisfied. There is a relationship between the dependent variable and the residuals. For instance mathematics and residuals are highly correlated however home science is slightly or else weakly related with residuals. This violates the conditions for inference. Independent variables should not be correlated with the error term. The error term should not also be related with dependent variable which in this case is not related to GPA as indicated by the straight horizontal correlation line. 9 f) (6 points) A new student, Jane, will enter the university as a computer science major in Fall 2017. Jane has average scores of 7, 7, and 9 in her high school classes on mathematics, science, and English, respectively. The model predicts that her GPA at the end of three semesters at the university will be 4.42. Show how this prediction is obtained by inserting appropriate values into the prediction equation you reported in part (c) of this problem. The standard error of this GPA estimate (mean estimate) is 0.085. Show how to construct an interval such that you would have 95 percent confidence that the interval will contain Jane's GPA at the end of her first three semesters at the university. GPA=2.5898766+0.1685666 HSM+0.0343154 HSS+0.0451018 HSE GPA=2.5898766+0.1685666 (7)+0.0343154 (7)+0.0451018(9) =4.42 CI= mean se*z =4.420.085*1.96 =4.420.1666 =(4.2534, 4.5866) 10 Question 5: (40 points) For the study described in problem 4, the researchers also collected data on each student's score on the quantitative part of the SAT exam (SATM) and the verbal part of the SAT exam (SATV). These two SAT scores are included with the HSM, HSS, and HSE scores and the GPA at the end of three semesters in the data file posted as male_gpa.csv, for the 145 male computer science majors in the study. Use these data to build a good prediction equation for GPA at the end of the three semesters for male computer science majors. Your report should include the following parts. (a) (12 points) Briefly report on each of the steps you took to develop a good prediction model. There is no need to include all of the details for each step of your investigation, but you should write one or two sentences describing what you learned from each step of your investigation. If you think it is important to include a graphical display or table to make your point please include the graph or table in your report. (b) (12 points) Report the R2 value, the ANOVA table, and a table of parameters estimates for the prediction model you think is best. Interpret the parameters in your model. You may have found more than one good model in part (a), and you can comment on those models, but you only to need to talk about the results for one of those models in this part. (c) (12 points) Use graphical displays and any other diagnostic methods covered in this class to assess how well you model fits the data and to assess if there are any problems. Comment on what you discover. (d) (4 points) Someone suggests that you could assess how well the model you reported in part (b) performs by applying it to the data for the 79 women in the study, because the data for these women were not used to fit the model you reported in part (b). You could use the model you presented in part (b) to predict the GPA for each of the 79 women and then compare those predictions to the actual GPAs achieved by those women. Do you think this is a good suggestion? Explain. 11 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f GPA HSM HSS HSE SATM SATV 5.32 10 10 10 670 600 3.84 9 6 6 610 390 4.26 6 8 5 700 640 4.35 8 6 8 640 530 5.72 7 8 7 550 500 4.08 9 10 7 670 600 5.38 8 9 8 540 580 2.4 6 6 7 560 690 5.5 8 7 8 630 500 5.29 10 8 8 760 630 4.83 6 7 7 690 440 4.88 9 7 6 690 460 5.06 8 6 5 540 400 5.21 8 8 7 600 400 4 3 7 6 460 530 5.18 9 10 8 670 450 4.77 6 5 9 590 440 4.34 7 7 6 570 480 4.26 5 7 7 530 440 4.03 6 7 9 540 610 5.08 9 10 6 491 488 5.34 5 9 7 600 600 3.4 6 8 8 510 530 3.43 10 9 9 750 610 4.48 8 9 6 650 460 5.73 10 10 9 720 630 4.43 7 10 10 530 560 5.8 10 10 9 760 500 6 9 10 10 640 480 6 9 9 8 800 610 4 9 6 5 640 670 5.74 9 10 9 750 700 4.32 9 7 8 520 440 4.63 10 10 6 640 500 4.79 8 8 7 610 530 5.7 8 10 8 520 410 3.66 8 4 3 590 470 5.41 9 9 9 520 490 5.21 7 9 8 505 435 5.08 9 10 8 559 607 4.81 9 7 4 559 488 4.12 7 7 8 559 545 5.33 7 6 7 500 460 5.75 10 9 9 760 620 3.69 8 7 7 490 390 3.93 8 6 8 590 510 5.16 10 9 8 640 490 4.73 9 8 7 520 360 4.46 6 7 7 490 370 5.06 8 10 10 580 460 3.07 7 8 6 700 520 5.35 10 10 10 620 570 Page 1 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f 3.82 3.59 3.14 2.65 4.12 5.7 4.82 5.12 4.25 4.93 4.34 5.16 4.11 3.34 4.96 4.83 5.1 5.07 4.53 4.87 4.75 5.61 5.14 4.25 3 4.79 4.17 3.95 4.39 4.15 2.75 5.35 5.58 5.06 4.75 4.5 4.78 4.64 5.26 5.09 5.41 4.44 3.11 5 5.67 5.12 5.81 4.17 2.12 5.3 4.3 4 5.22 6 8 10 9 7 10 4 10 9 10 8 9 6 6 9 10 9 7 8 9 10 10 9 10 8 9 10 7 6 6 7 10 10 5 10 9 9 9 10 10 9 8 7 4 10 10 10 8 4 10 9 6 9 8 9 10 7 6 10 5 10 7 10 9 7 9 7 7 9 10 4 9 9 10 10 8 10 9 6 7 8 5 6 6 10 7 9 7 9 9 9 10 10 4 8 7 3 10 10 10 7 6 10 10 5 7 6 7 7 7 7 10 7 7 4 10 7 7 9 8 6 9 9 7 8 9 10 9 9 10 10 7 7 9 6 6 6 10 8 9 5 9 10 8 9 8 7 8 7 4 10 10 7 8 6 9 10 6 9 490 670 720 640 520 580 400 640 550 600 480 400 480 530 670 710 750 660 550 620 720 630 640 690 640 690 650 550 470 480 540 730 710 510 770 560 600 620 610 540 600 690 590 620 640 580 750 650 630 650 590 530 650 550 480 610 520 380 580 470 520 290 520 410 390 390 470 440 530 670 480 500 480 500 440 630 580 600 400 450 570 330 460 590 650 400 380 720 500 510 590 560 470 360 490 480 560 570 340 540 500 490 480 420 320 490 Page 2 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f 5.62 3.88 5.2 4.55 4.04 4.82 5.25 4.97 4.21 4.81 4.5 5.03 3.92 4.58 4.16 6 4.5 3.81 3.85 4.7 4.96 3.46 5.32 4.76 4.71 5.4 4.95 4.65 4.48 2.8 5.86 5.4 2.91 4.67 4.51 3.79 4.42 2.58 5 4.62 10 10 9 7 8 10 9 10 7 10 10 8 9 10 6 7 7 9 10 6 9 7 10 10 8 9 9 8 8 8 10 7 6 9 9 7 6 5 10 9 10 6 5 8 7 9 7 10 7 10 9 8 10 9 6 6 10 9 8 8 7 7 9 10 7 10 9 10 8 10 10 8 5 9 8 7 6 7 10 8 8 6 7 8 7 9 8 10 8 10 9 7 8 9 6 6 10 9 7 6 8 8 10 10 9 9 8 8 7 9 10 4 7 10 7 5 8 7 9 7 660 620 570 570 690 660 690 770 670 620 660 600 447 720 590 600 630 620 700 580 630 630 660 600 700 550 620 680 630 470 750 710 586 586 700 550 505 515 774 491 630 430 570 480 440 550 550 540 500 570 460 630 320 740 440 410 500 580 480 470 630 540 560 560 440 560 400 450 500 410 760 500 697 670 500 570 518 285 688 391 Page 3 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f GPA HSM HSS HSE SATM SATV 5.32 10 10 10 670 600 3.84 9 6 6 610 390 4.26 6 8 5 700 640 4.35 8 6 8 640 530 5.72 7 8 7 550 500 4.08 9 10 7 670 600 5.38 8 9 8 540 580 2.4 6 6 7 560 690 5.5 8 7 8 630 500 5.29 10 8 8 760 630 4.83 6 7 7 690 440 4.88 9 7 6 690 460 5.06 8 6 5 540 400 5.21 8 8 7 600 400 4 3 7 6 460 530 5.18 9 10 8 670 450 4.77 6 5 9 590 440 4.34 7 7 6 570 480 4.26 5 7 7 530 440 4.03 6 7 9 540 610 5.08 9 10 6 491 488 5.34 5 9 7 600 600 3.4 6 8 8 510 530 3.43 10 9 9 750 610 4.48 8 9 6 650 460 5.73 10 10 9 720 630 4.43 7 10 10 530 560 5.8 10 10 9 760 500 6 9 10 10 640 480 6 9 9 8 800 610 4 9 6 5 640 670 5.74 9 10 9 750 700 4.32 9 7 8 520 440 4.63 10 10 6 640 500 4.79 8 8 7 610 530 5.7 8 10 8 520 410 3.66 8 4 3 590 470 5.41 9 9 9 520 490 5.21 7 9 8 505 435 5.08 9 10 8 559 607 4.81 9 7 4 559 488 4.12 7 7 8 559 545 5.33 7 6 7 500 460 5.75 10 9 9 760 620 3.69 8 7 7 490 390 3.93 8 6 8 590 510 5.16 10 9 8 640 490 4.73 9 8 7 520 360 4.46 6 7 7 490 370 5.06 8 10 10 580 460 3.07 7 8 6 700 520 5.35 10 10 10 620 570 Page 1 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f 3.82 3.59 3.14 2.65 4.12 5.7 4.82 5.12 4.25 4.93 4.34 5.16 4.11 3.34 4.96 4.83 5.1 5.07 4.53 4.87 4.75 5.61 5.14 4.25 3 4.79 4.17 3.95 4.39 4.15 2.75 5.35 5.58 5.06 4.75 4.5 4.78 4.64 5.26 5.09 5.41 4.44 3.11 5 5.67 5.12 5.81 4.17 2.12 5.3 4.3 4 5.22 6 8 10 9 7 10 4 10 9 10 8 9 6 6 9 10 9 7 8 9 10 10 9 10 8 9 10 7 6 6 7 10 10 5 10 9 9 9 10 10 9 8 7 4 10 10 10 8 4 10 9 6 9 8 9 10 7 6 10 5 10 7 10 9 7 9 7 7 9 10 4 9 9 10 10 8 10 9 6 7 8 5 6 6 10 7 9 7 9 9 9 10 10 4 8 7 3 10 10 10 7 6 10 10 5 7 6 7 7 7 7 10 7 7 4 10 7 7 9 8 6 9 9 7 8 9 10 9 9 10 10 7 7 9 6 6 6 10 8 9 5 9 10 8 9 8 7 8 7 4 10 10 7 8 6 9 10 6 9 490 670 720 640 520 580 400 640 550 600 480 400 480 530 670 710 750 660 550 620 720 630 640 690 640 690 650 550 470 480 540 730 710 510 770 560 600 620 610 540 600 690 590 620 640 580 750 650 630 650 590 530 650 550 480 610 520 380 580 470 520 290 520 410 390 390 470 440 530 670 480 500 480 500 440 630 580 600 400 450 570 330 460 590 650 400 380 720 500 510 590 560 470 360 490 480 560 570 340 540 500 490 480 420 320 490 Page 2 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f 5.62 3.88 5.2 4.55 4.04 4.82 5.25 4.97 4.21 4.81 4.5 5.03 3.92 4.58 4.16 6 4.5 3.81 3.85 4.7 4.96 3.46 5.32 4.76 4.71 5.4 4.95 4.65 4.48 2.8 5.86 5.4 2.91 4.67 4.51 3.79 4.42 2.58 5 4.62 10 10 9 7 8 10 9 10 7 10 10 8 9 10 6 7 7 9 10 6 9 7 10 10 8 9 9 8 8 8 10 7 6 9 9 7 6 5 10 9 10 6 5 8 7 9 7 10 7 10 9 8 10 9 6 6 10 9 8 8 7 7 9 10 7 10 9 10 8 10 10 8 5 9 8 7 6 7 10 8 8 6 7 8 7 9 8 10 8 10 9 7 8 9 6 6 10 9 7 6 8 8 10 10 9 9 8 8 7 9 10 4 7 10 7 5 8 7 9 7 660 620 570 570 690 660 690 770 670 620 660 600 447 720 590 600 630 620 700 580 630 630 660 600 700 550 620 680 630 470 750 710 586 586 700 550 505 515 774 491 630 430 570 480 440 550 550 540 500 570 460 630 320 740 440 410 500 580 480 470 630 540 560 560 440 560 400 450 500 410 760 500 697 670 500 570 518 285 688 391 Page 3 Hello I have gone through the work. Corrected where necessary and completed the rest of the work. However the last question requires data. Kindly if you could provide the data I could appreciate as this will assist me in satisfying you by providing comprehensive and quality work. Its a pleasure to work with you. Thank you In 2009, a large Midwestern University wanted to give a report to the state's board of regents to justify the continued expenditures for their study abroad programs. In particular, they wanted to show that students who studied abroad had better language proficiency than their peers who did not study abroad. Every entering student is required to take a language proficiency/placement exam at the beginning of their first year. Four years later, the study abroad office took a random sample of 1000 students who had completed beyond the 3 semesters minimum required for their general education credits and administered an additional proficiency exam to those students. Of the 1000 students, 221 students had completed one or more semesters abroad, whereas 779 had not done no study abroad. Of the 221 who studied abroad, 193 were found to be proficient or above in their language skills, whereas 534 of the students who had not studied abroad demonstrated proficient or above language skills. a) (3 points) Can this study be considered an experiment? Why or why not? This study is not an experiment. Because an experiment has random assignment and a researcher manipulates the Independent Variable, allows researcher to infer causation. There is not manipulation in this study. This would be an observation. b) (8 points) Let p1 represent the proportion of the proficient students who studied abroad. Let p2 represent the proportion of proficient students who did not study abroad. Find the values for each proportion, then list the additional information necessary to construct a 90 percent confidence interval for p1 - p2 , including conditions and the formula. Then compute the confidence interval. 193/221=0.8733 534/779=0.6855 CI= (P1-P2) Z*SE Standard error = .02788 ME = .04589 Z score = 1.645 =CI=(0.8733-0.6855)*1.645*0.02788 =0.18780.04589 L limit = 0.14195 U Limit = 0.23367 c) (3 points) Using the confidence interval you reported in part (a), can you reject the null hypothesis that the proportion of students who are language proficient with study abroad is the same as the proportion of students who are language proficient without study abroad? Explain. The p value is <0.1 level of alpha. There is therefore high probability of making type 1 error. so with this information we can reject the null hypothesis. The standard error is not 1 within the upper and lower limits we conclude that there is significant evidence that there is a difference between the study abroad and those who do not. d) (2 points) Give at least one good reason why it would have been better if the number of students in the study who had studied abroad would have been equal to the number of students who had not studied abroad. It would have been better for the two groups to be equal because having two different sizes causes more variance. It also can cause it to be skewed and sometimes violate normality. Sample size has much influence to standard deviation and the size of error. Therefore when one sample has a larger size than the other there is variation in the standard error as well as the size of error. As sample size increases error is minimized. Question 2 (24 points) A research laboratory is developing new compounds to provide relief from a specific allergy. In an experiment with 60 volunteer subjects who suffered from the allergy, the amounts of two active ingredients (ingredient A and ingredient B) were varied. Two levels (0.2g and 0.4g) were used for ingredient A and three levels (1g, 2g and 3g) were used for ingredient B. Consequently, six different compounds were made from the 2 3=6 combinations of levels for these two ingredients. The 60 subjects were randomly assigned to the six compounds so that 10 subjects were assigned to each of the six compounds. Each subject took a pill containing the assigned compound and the number of hours of relief from allergy symptoms was recorded for each subject. a) (3 points) Identify the experimental units (the who), the treatments, and the response that was measured on each subject. Individuals with suffered from the allergy. Treatment was 6 levels of compounds containing different dosages of ingredient A and ingredient B. The response number of hours of relief from allergy symptoms. Experimental units influence the response units in this case the 6 levels of compounds containing different dosages are the treatment units influencing number of hours of relief from allergy symptoms. b) (2 points) Randomization was used in this experiment. What is the reason (or reasons) for randomly assigning subjects to treatments? The reason for randomly assigning subjects to treatments is so it eliminates the possibility for any biases. Biasness could result from prejudice of purposive sampling and some other non-probabilistic sampling. Random sampling is the best in eliminating biasness. c) (6 points) Considering the experimental design, list at least three scientific questions which the national chain would want to answer from the data generated by this study. For each question, write an appropriate null hypothesis and an appropriate alternative hypothesis. Does higher dosage of A and B give more relief than lower dosages? Does Ingredient B offer more relief than ingredient A? Does Ingredient A offer more relief than ingredient B? H o=0 H A 0 H o=a1+ a2 +a3 =0 H A a1 +a2 +a 3 0 H o= 1 + 2+ 3 =0 H A 1 + 2 + 3 0 2 d) (8 points) Outline an analysis of variance that would be useful for testing some or all of the hypotheses you presented in part (c). You should have one line in your ANOVA table for each source of variation and you should report the value for the corresponding degrees of freedom. Using the outline of the ANOVA table, describe how you would test some or all of the hypotheses you presented in part (c). (Note that you do not have any data, so you cannot compute numerical values for sums of squares, mean squares, or test statistics.) Source of SS Df MS F P variation Ingriedient A 1 Ingredient B 2 Ingredient A x B 5 Within 54 Total 62 e) (3 points) Although blocking was not done in this experiment, describe the potential benefit of using blocking in an experiment. Identify a potential blocking factor for this experiment, explain why it would be a good blocking factor, and describe how the experiment could be performed as a randomized block design. By blocking you can isolate the variability attributive to the difference between blocks, so you can see the difference caused by the treatment more clearly. It also helps to avoid nuisance factors. A potential blocking factor one could use is blocking all the levels with ingredient A at 0.2g, and blocks all the levels with ingredient A at 0.4g. by doing this one can see if there is a significant difference between the two different levels of ingredient A. f) (2 points) Describe what would need to be done to make this a double blind experiment. To make this a double blind experiment the person receiving the treatment wouldn't know what level of treatment they are receiving and the person administering the treatment wouldn't know what treatment the participant is receiving either. Question 3: (12 points) 3 A random poll of voting age citizens in Montana was conducted to gauge the current partisan make up of the state to prepare for the upcoming primary election. Participants were asked to identify their gender and party affiliation. Gender Men Women Democrat 36 48 Independent Republican 24 45 16 33 Do these data suggest that there are significant differences in the distribution of partisan affiliations for men and women? Perform an appropriate test to address this question. Your response should include: i) ii) A precise statement of the null hypothesis and the alternative hypothesis. H 0:there is no associationbetween gender partisan affiliation H A :there is association between gender partisan affiliation 0 Checks of the conditions for inference. Randomization is met. The participants are from a random poll of voting age citizens. Normality is met because it has a large sample, n=202. 10 % is met its less than 10% of the entire population. iii) The formula and value of your test statistic, relevant degrees of freedom, and a pvalue. gender democrats independent republican total men 36 24 45 105 women 48 16 33 97 total 84 40 78 202 We wish to calculate chi square for test of association between gender and political affiliations. A11= 84*105/202=43.66 A21=84*97/202=40.3366 A12=40*105/202=20.792 A22=40*97/202=19.2079 A13=78*105/202=40.5445 A33=78*97/202=37.4554 Chi square is given by (o-e)2/e =(36-43.66)2/43.66 +(48-40.3366)2/40.3366 +(24-20.792)2/20.792 +(16-19.2079)2/19.2079 +(45-40.5445)2/40.5445 +(33-37.4554)2/37.4554 =1.3439+1.4559+0.4949+0.5357+0.4896+0.5299 =4.84998 iv) A clear statement of your conclusion in the context of this study. 4 Degrees of freedom= (3-1)*(2-1) =2*1=2 Critical value at 2 degrees of freedom and 0.05 level of significance is equal to =5.99 Calculated value is less than critical value we therefore do not reject null hypotheses. The p-value for the test is 0.08848 This is greater than 0.05. We therefore affirm the decision not to reject null hypotheses. We therefore conclude that there was no association between gender and party affiliation and hence no difference between men and women. Question 4: (48 points) As part of a study of student performance at a large university, data were collected on a random sample of freshman computer science majors. Of particular interest was the cumulative grade point average (GPA) at the end of each student's first three semesters at the university. Other information recorded on each student at the time the student enrolled at the university includes average high school grades in mathematics (HSM), average high school grades in science (HSS), and average high school grades in English and communication courses (HSE). Researchers at the university were interested in predicting the GPA's for computer science majors at the end of first three semesters of enrollment from the information on high school grades. In this data set, high school grades were coded on a scale from 1 to 10, with 10 corresponding to an A, 9 to a A-, 8 to a B+, etc. At this university, GPA's are recorded on a scale from 0 to 6, with 6 corresponding to a straight A performance. Results for 224 computer science majors were included in this study; there were 145 men and 79 women. a) The researchers wanted to know if there was a significant difference between the average GPA's at the end of three semesters of study for men and women computer science majors. They created the following box plot and compiled the following summary statistics. Gender Men Women GPA Summary Statistics Standard Number Mean Deviation 145 4.6077 0.8068 79 4.6857 0.7288 5 95% Confidence Intervals Lower Upper 4.5225 4.8489 4.4753 4.7402 i) (4 points) What information is provided by the side-by-side box plots? Do conditions for inference appear to be satisfied? The box plots show us that the center of distribution is about the same, the median is about the same, variation is about the same and the IQR are about the same. It shows us that could be a little skewed to the left being its towards the top of the whisker. So it shows us we have normal distribution. And our normality is met. ii) (2 points) State the null hypothesis and the alternative hypothesis that the researchers should use to answer their question. 12=0 12 0 iii) (2 points) Explain why a two sample t-test should be used in this situation instead of a paired t-test. A two sample t test should be used in this instance because of several factors that could be part of this study. Some examples would be some students entering into college not taking computer classes before or all of them taking different classes from different schools. Another factor to take into consideration is students taking different classes each semester so one is not comparing the same class load with each other. Another reason is because of different levels of education when entering into the school. iv) (6 points) Perform the t-test. Report the value the test statistic, its degrees of freedom, and report a p-value. State your conclusion in the context of this study. Difference = (4.6077-4.6857) =-0.078 Standard error for the difference between means = 0.10911361 T-stat = (mean 1- mean 2)/SE -0.078/0.10911361 = -0.7148512 Df = 222 P-Value = 0.4755 The p-value is more than 0.05 level of significance we therefore do not reject null hypotheses. We therefore conclude that there is no difference GPA's at the end of three semesters of study for men and women computer science majors. v) (4 points) Explain why checking if the 95% confidence interval for the mean GPA for men overlaps with the 95% confidence interval for the mean GPA for women is not an appropriate method for testing the null hypothesis you stated in part (ii). The sample sizes were different and this will have an influence on the confidence interval. The larger the sample size the narrower is the confidence interval. Therefore checking for the overlapping of the confidence intervals will be inappropriate for testing. 6 b) (4 points) As a first step in examining the relationship between GPA after the first three semesters at the university and average high school grades in mathematics (HSM), science (HSS) and English (HSE), the researchers computed the following correlation results and the following scatterplot matrix. GPA HSM HSS HSE Variable HSM HSS HSS HSE HSE HSE GPA 1.0000 0.4365 0.3294 0.2890 by Variable GPA GPA HSM GPA HSM HSS Correlations HSM 0.4365 1.0000 0.5757 0.4469 HSS 0.3294 0.5757 1.0000 0.5794 Pairwise Correlations Correlation 0.4365 0.3294 0.5757 0.2890 0.4469 0.5794 Count 224 224 224 224 224 224 Lower 95% 0.3240 0.2073 0.4809 0.1641 0.3355 0.4851 HSE 0.2890 0.4469 0.5794 1.0000 Upper 95% 0.5369 0.4414 0.6572 0.4048 0.5460 0.6603 Signif Prob <.0001* <.0001* <.0001* <.0001* <.0001* <.0001* Summarize what these results tell you about relationships between the four variables, GPA, HSM, HSS, and HSE. 7 All the variables are positively correlated. This means that as one variable increases the other variable also increases. However most of the variables had a weak and moderate positive linear relationship. For instance, mathematics and GPA had a moderate relationship indicated by the correlation coefficient of 0.4365 home science and GPA weak relationship indicated by a coefficient of 0.3294. However mathematics and GPA had a much improved positive relationship of shown by a correlation coefficient above 0.5. English and GPA had a weak relationship too with a correlation coefficient of 0.2890. English and mathematics had a moderate positive relationship of 0.4469 while English and home science was slightly strongly related with a correlation coefficient above 0.5. All the coefficients were significant with p-values less than alpha. c) The following results (produced by JMP) are from the regression of GPA on HSM, HSS, and HSE. Analysis of Variance Source Model Error C. Total DF 3 220 223 Sum of Squares 27.71233 107.75046 135.46279 Term Intercept HSM HSS HSE Estimate 2.5898766 0.1685666 0.0343156 0.0451018 Mean Square 9.23744 0.48977 F Ratio 18.8606 Prob > F <.0001 R-Square 0.2046 Parameter Estimates i) (2 points) Interpret the Std Error 0.294243 0.035492 0.037559 0.038696 t Ratio 8.80 4.75 0.91 1.17 Prob>|t| <.0001 <.0001 0.3619 0.2451 R2 value. 0.2046 means that the 20.46 of the GPA is influenced by the model or otherwise determined by the independent variables which are Mathematics home science and English. ii) (6 points) Write out the formula for the equation for predicting GPA's for computer science students at the end of the first three semesters of enrollment from high school academic performance as summarized by the HSM, HSS, and HSE variables. Interpret the estimates of the regression coefficients for this model. GPA=2.5898766+0.1685666 HSM+0.0343154 HSS+0.0451018 HSE The first coefficient which is 2.5898766 is the constant value of GPA that is not influenced by the independent variables. 0.1685666 is the amount of change in GPA as a result of unit change in mathematics. While 0.0343154 is the amount of change in GPA as a result of a unit change in Home science. 0.0451018 is the amount of change in GPA due to a unit change in English. 8 iii) (4 points) The coefficients for HSS and HSE are not statistically significant for this model. Does this imply that neither HHS nor HSE provide any information for predicting GPA? Should both HSS and HSE be deleted from the model? Explain. This does not mean that because the coefficients for home science and English are not statistically significant does not provide any information in predicting GPA. However this indicates that the data collected could not provide enough evidence on how important the variables were in predicting GPA. Therefore the variables should be deleted from the model d) (4 points) Two of the partial residual (leverage) plots produced by JMP are shown below. Summarize the information in these plots. e) (4 points) A plot of the residuals versus the predicted values is shown below. Summarize the information this plot provides about how well the model describes the data and if conditions for inference are well satisfied. There is a relationship between the dependent variable and the residuals. For instance mathematics and residuals are highly correlated however home science is slightly or else weakly related with residuals. This violates the conditions for inference. Independent variables should not be correlated with the error term. The error term should not also be related with dependent variable which in this case is not related to GPA as indicated by the straight horizontal correlation line. 9 f) (6 points) A new student, Jane, will enter the university as a computer science major in Fall 2017. Jane has average scores of 7, 7, and 9 in her high school classes on mathematics, science, and English, respectively. The model predicts that her GPA at the end of three semesters at the university will be 4.42. Show how this prediction is obtained by inserting appropriate values into the prediction equation you reported in part (c) of this problem. The standard error of this GPA estimate (mean estimate) is 0.085. Show how to construct an interval such that you would have 95 percent confidence that the interval will contain Jane's GPA at the end of her first three semesters at the university. GPA=2.5898766+0.1685666 HSM+0.0343154 HSS+0.0451018 HSE GPA=2.5898766+0.1685666 (7)+0.0343154 (7)+0.0451018(9) =4.42 CI= mean se*z =4.420.085*1.96 =4.420.1666 =(4.2534, 4.5866) 10 Question 5: (40 points) For the study described in problem 4, the researchers also collected data on each student's score on the quantitative part of the SAT exam (SATM) and the verbal part of the SAT exam (SATV). These two SAT scores are included with the HSM, HSS, and HSE scores and the GPA at the end of three semesters in the data file posted as male_gpa.csv, for the 145 male computer science majors in the study. Use these data to build a good prediction equation for GPA at the end of the three semesters for male computer science majors. Your report should include the following parts. (a) (12 points) Briefly report on each of the steps you took to develop a good prediction model. There is no need to include all of the details for each step of your investigation, but you should write one or two sentences describing what you learned from each step of your investigation. If you think it is important to include a graphical display or table to make your point please include the graph or table in your report. (b) (12 points) Report the R2 value, the ANOVA table, and a table of parameters estimates for the prediction model you think is best. Interpret the parameters in your model. You may have found more than one good model in part (a), and you can comment on those models, but you only to need to talk about the results for one of those models in this part. (c) (12 points) Use graphical displays and any other diagnostic methods covered in this class to assess how well you model fits the data and to assess if there are any problems. Comment on what you discover. (d) (4 points) Someone suggests that you could assess how well the model you reported in part (b) performs by applying it to the data for the 79 women in the study, because the data for these women were not used to fit the model you reported in part (b). You could use the model you presented in part (b) to predict the GPA for each of the 79 women and then compare those predictions to the actual GPAs achieved by those women. Do you think this is a good suggestion? Explain. 11 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f GPA HSM HSS HSE SATM SATV 5.32 10 10 10 670 600 3.84 9 6 6 610 390 4.26 6 8 5 700 640 4.35 8 6 8 640 530 5.72 7 8 7 550 500 4.08 9 10 7 670 600 5.38 8 9 8 540 580 2.4 6 6 7 560 690 5.5 8 7 8 630 500 5.29 10 8 8 760 630 4.83 6 7 7 690 440 4.88 9 7 6 690 460 5.06 8 6 5 540 400 5.21 8 8 7 600 400 4 3 7 6 460 530 5.18 9 10 8 670 450 4.77 6 5 9 590 440 4.34 7 7 6 570 480 4.26 5 7 7 530 440 4.03 6 7 9 540 610 5.08 9 10 6 491 488 5.34 5 9 7 600 600 3.4 6 8 8 510 530 3.43 10 9 9 750 610 4.48 8 9 6 650 460 5.73 10 10 9 720 630 4.43 7 10 10 530 560 5.8 10 10 9 760 500 6 9 10 10 640 480 6 9 9 8 800 610 4 9 6 5 640 670 5.74 9 10 9 750 700 4.32 9 7 8 520 440 4.63 10 10 6 640 500 4.79 8 8 7 610 530 5.7 8 10 8 520 410 3.66 8 4 3 590 470 5.41 9 9 9 520 490 5.21 7 9 8 505 435 5.08 9 10 8 559 607 4.81 9 7 4 559 488 4.12 7 7 8 559 545 5.33 7 6 7 500 460 5.75 10 9 9 760 620 3.69 8 7 7 490 390 3.93 8 6 8 590 510 5.16 10 9 8 640 490 4.73 9 8 7 520 360 4.46 6 7 7 490 370 5.06 8 10 10 580 460 3.07 7 8 6 700 520 5.35 10 10 10 620 570 Page 1 qattachments_fae608467f68d7d7d0643d294980e341de48ef5f 3.82 3.59 3.14 2.65 4.12 5.7 4.82 5.12 4.25 4.93 4.34 5.16 4.11 3.34 4.96 4.83 5.1 5.07 4.53 4.87 4.75 5.61 5.14 4.25 3 4.79 4.17 3.95 4.39 4.15 2.75 5.35 5.58 5.06 4.75 4.5 4.78 4.64 5.26 5.09 5.41 4.44 3.11 5 5.67 5.12 5.81 4.17 2.12 5.3 4.3 4 5.22 6 8 10 9 7 10 4 10 9 10 8 9 6 6 9 1
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started