Question

1 Approved Answer

Posted on Oct 10, 2024

StatCrunch Assignment 1 - Part B (1B) This page explains the assignment. The questions you need to answer are on the next page. A key

StatCrunch Assignment 1 - Part B (1B) This page explains the assignment. The questions you need to answer are on the next page. A key part of research involves formulating interesting questions then developing a methodology and collecting appropriate data to answer those questions. The final project in this course will follow this usual path for research. For this assignment, however, the data is already available and you must write questions that could have been in the mind of the survey developer that led to the collection of this data. For convenience, the survey items are shown below. qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx Here are a variety questions (from the StatCrunch website) that could be answered using the data collected by the student survey: The items below ask you to formulate a question, describe your methodology for answering the question, carry out the methodology you described, then answer your question. Although not as detailed, this mirrors the format used in many research reports such as the one due at the end of the term for this course, in RSCH 202, and capstone courses for some degree programs. Respond to the following items. 1. In Part A of this assignment, you selected a random sample of 30 StatCrunch U students and created a StatCrunch file containing data from the above survey for those students. You will use the StatCrunch data file you created in Part A to complete this assignment. Following the instructions at the end of Part A, paste your StatCrunch data file in the space below. This will allow your instructor to see your data. Be sure that your data is properly aligned in columns. qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx PASTE YOUR SAMPLE DATA HERE: 1088 1251 8629 10558 11812 12049 16897 17803 19512 22384 24218 24427 24941 26468 29081 30926 31324 31848 33458 33483 34152 37088 37115 37144 37412 38917 39127 40834 43199 43427 Male 1 Female Male 2 Male 2 Female Female Male 4 Female Female Female Male 2 Female Male 2 Female Female Male 3 Female Male 4 Female Female Female Female Female Male 4 Female Female Male 4 Female Male 2 Male 2 12 3 6 8 1 2 16 1 1 1 15 1 11 3 4 15 4 14 4 1 1 1 1 13 2 4 16 1 14 11 0 15 36.5 28.5 12 12 0 15 15 13 0 15 25.5 8 17 0 15 0 18 17 20 15 15 21 15 13 0 15 17 0 0 14.5 8467 0 20 0 0 0 0 0 0 17 0 29.5 0 0 0 0 0 0 0 0 0 15416 23 0 0 0 8215 0 1333 12674 0 6249 2813 0 9424 0 3743 0 2979 0 2592 12307 0 3459 0 6661 14521 5325 3629 0 0 4665 0 14899 3746 5586 3082 3439 3875 1127 1650 1394 1204 1258 1281 4042 6876 5493 0 922 1082 1093 1037 2621 7761 1283 2a. State a single question related to a categorical variable that can be answered using your survey data and the techniques you have studied thus far in this course. You are encouraged to develop your own question, but you may use an appropriate question from the StatCrunch website. The question you use should be about the entire population of StatCrunch U students not just about those in the sample. Assume that your sample is representative of the population. Question:Is the distribution of males and females equal in sophopmore,freshman,junior and senior? qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx b. Explain the methodology you will use to answer the question you posed. Your explanation should include answers to the following questions. Do not include your analysis or answers to your question hereonly describe how you will do the analysis. What is the variable of interest? Variable is the gender across student classification. What graphical techniques will you use to describe your data? Bar chart is used to describe the data. Why are those techniques appropriate for analyzing this data? c. Carry out the methodology described in b above. Use StatCrunch and paste copies of the graphs/charts from StatCrunch in the space below. Gender * Classification Crosstabulation Count Classification Freshman Gender Female Male Total Student Classification Freshman Sophomore Junior Senior Sophomore Junior Senior Total 10 2 2 4 18 1 6 1 4 12 11 8 3 8 30 Proportion of female 0.91 0.25 0.67 0.5 qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx Proportion of female 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Freshman Sophomore Junior Senior d. Based on the results of b and c above, answer your question. Include an explanation of how you used the graphs and charts to formulate your answer. From the above we can see that the distribution of gender varies across the various student classification as it is maximum in case of freshman and minimum in case of sophomore. As the sample size is equal to 30 therefore it would be a better estimate(Central Limit theorem) to generalise it to the whole population that the distribution of male and female of student classification varies. 3a. State a single question related to a numerical variable that can be answered using your survey data and the techniques you have studied thus far in this course. You are encouraged to develop your own question, but you may use an appropriate question from the StatCrunch website. The question you use should be about the entire population of StatCrunch U students not just about those in the sample. Assume that your sample is representative of the population. Question: What is the mean number of credit hours per student?Is the mean credit hour is less than 15 in both male and female?Is the mean credit hours for male and female vary significantly at the 0.05 level of significance? b. Explain the methodology you will use to answer the question you posed. Your explanation should include answers to the following questions. Do not include your analysis or answers to your question hereonly describe how you will do the analysis. What is the variable of interest? Variable:Credit Hours What graphical techniques will you use to describe your data? To compare the mean credit hours across gender we can use the bar graph. qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx Why are those techniques appropriate for analyzing this data? Bar graph will be a good technique to analyses whether the mean credit hours across gender differ and if the mean credit hours of either male and female is greater than or less than 15. What numerical measures will you use to describe your data? Numerical measure-Mean of the credit hours of the male and female. Why are those measures appropriate for analyzing this data? As we care comparing whether the credit hours vary significantly with respect to the gender. c. Carry out the methodology described in b above. Use StatCrunch and paste copies of the graphs/charts and numerical summaries from StatCrunch in the space below. Bar graph of mean Credit hours of male and female: Mean Credit hour 15 14.5 14 13.5 13 12.5 12 11.5 Female Male We will use independent sample t-test to determine whether the mean credit hours for male and female differ. Group Statistics Gender Credithours N Mean Std. Deviation Std. Error Mean Male 12 12.58 3.147 .908 Female 18 14.72 2.608 .615 Independent Samples Test qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the F Sig. t df Sig. (2- Mean Std. Error tailed) Difference Difference Difference Lower Upper Credithours Equal variances 1.184 .286 -2.027 28 .052 -2.139 1.055 -4.301 .023 -1.950 20.587 .065 -2.139 1.097 -4.423 .145 assumed Equal variances not assumed d. Based on the results of b and c above, answer your question. Include an explanation of how you used the graphs, charts, and numerical summaries to formulate your answer. Analysis from Graph: From the bar graph it is clear that the mean credit hours in case of both male and female is less than 15 because the y-axis value of 15 is not crossed by any of the gender which shows the mean credit hours of male and female..However for female it is close to 15, which is 14.72. Determining whether mean credit hours varies significantly across gender? Null Hypothesis:Ho:(male)=(female) Alternative Hypothesis:Ha:(male)(female) is the mean credit hours. Consider the null hypothesis that the mean credit hours for male and female are equal against the alternative hypothesis that the mean credit hours differ significantly at the 0.05 level of significance. From the above numerical summaries the p-value is 0.0523 is greater than 0.05 therefore we will conclude that the mean credit hours is equal with respect to gender. qattachments_eaf9f12e52170af0ca0fbec432c5c708f1227e26.docx StatCrunch Assignment 2 First save this file to your computer. Answer each of the following questions, then resave the file along with your answers and turn it in using the assignment link in Module 3, Activity 4. The first four problems are worth 10 points each. Problems 5 and 6 are worth 30 points each. 1. If the original sample is 48, 55, 43, 61, 39, which of the following would not be a possible bootstrap sample? Explain why it wouldn't be. The reason it is B is because 56 is not in the original sample. a) 48, 55, 43, 61, 39 b) 43, 39, 56, 43, 61 c) 55, 48, 55, 48, 61 d) 39, 39, 39, 39, 39 2. The following bootstrap output is for mileage of a random sample of 25 mustang cars. Based on a 90% confidence interval, which of the following would not be a plausible value of the population mean, ? Explain why it wouldn't be. 52.096 to 80.012 would be the 90% confidence interval a) 60.01 b) 80.01 c) 55 d) 52 3. Which of the following p-values would provide more evidence in support of the alternate hypothesis and against the null hypothesis? Explain your answer. 0.009 due to the low PValue. This data supports the alternate hypothesis that there are no probability that a difference as large as that in the original data is due to luck. a) 0.15 b) 0.85 c) 0.009 d) 0.05 4. In a sample of 56 addicts, 28 were given a new drug to help them overcome their addiction, while the remaining 28 were given the current drug. The following is the output from a randomization test for the difference in the proportion of addicts suffering relapses. a) What is the difference in proportions in the original data? 0.42857143 b) What is the p-value and what does it mean in terms of this specific problem? .37%. This means that about .37% of the simulations resulted in differences in proportions as extreme or more extreme than what was in the original sample. In StatCrunch Assignment 1A, you selected a random sample of 30 StatCrunch U students and gathered some survey data regarding those students. Use the StatCrunch data file you created to complete the remainder of this assignment. 5. Construct a bootstrap distribution of the credit card debt data from your sample using 3000 resamples. a) Paste a copy of your distribution here. (With a PC, you can press the control-alt-fn-F11 keys to copy the window showing the distribution.) b) What is the mean of your original sample? The original Mean was 3054.2667 c) What is the 95% confidence interval estimate of the population mean obtained from your bootstrap distribution? 3793.45 6. Suppose you want to compare the amount of credit card debt for males and females. The null hypothesis for this problem would be no difference in the mean amount of credit card debt for males and females and the alternate hypotheses would be that there is a difference in mean amounts of credit card debt. Conduct a randomization test for two means with 3000 randomizations. (Your input menu should be as follows.) a) Paste the output from your randomization test here. b) What is the mean difference in credit card debt of the two groups in the original data? Males = 2966.58333 Females = 3112.722222 With a difference of -146.138889 c) What p-value did you get in your randomization? Explain in the context of the problem what the p-value means. The P-Value after 3000 random sampling is 0.0923. There is a difference in proportions of -1524.69 or below and 1524.69 and above 95% of random samples. Approximately 95% of the time, there is a difference in the mean CC debt between the males and females compared to the original sample. d) Do you think the data support the null hypothesis of no difference in mean credit card debt between males and females or the alternate hypothesis that there is a difference? Explain your answer. Yes, I do believe this supports the null hypothesis due to the high P-Value. -1524.69 or below and 1524.69 and above is a large probability of getting the difference in means. StatCrunch StatCrunch Assignment 3 In this assignment, you will use the StatCrunch U data set that you developed in Module 2 as part of the first StatCrunch assignment. As you did in StatCrunch Assignment 1B, look at the items in the StatCrunch U survey and develop a question regarding population proportions that can be answered using the survey data you collected. As a reminder, here are the items on the survey: Here are some possible questions. Feel free to pick one of those questionsbut it would be better for you to formulate a question of your own. For this assignment, you need a question related to a single proportion (since our sample data is drawn from a single population) and one that will require the use of a confidence interval and hypothesis test to answer. NOTE: There is a sample StatCrunch Assignment 3 designed to give you an idea of how to do this assignment. It uses a question related to the proportions of female and male students at StatCrunch U. You cannot use that question on this assignment. Is the proportion of female students at StatCrunch U different from the nationwide proportion of female students as reported by USA today? Is the proportion of seniors less than 25% of the student body? Is the proportion of students who work different from the nationwide proportion of students who work reported as 71% by a 2013 U.S. Census report? Is the proportion of students with student loans greater than the proportion of students without student loans? (This can be treated as a single proportion problem from the point of view that all students either do or do not have a student loan. Answer each of the following questions. Point values are indicated with each question. 1. (20 pts.) State your question. Remember that your question should be related to the population proportion or proportions and should be one that will require the use of a confidence interval and hypothesis test to answer. Assume that your sample is representative of the population. 2. (30 pts.) Explain the methodology you will use to answer the question you posed. Your explanation should include answers to the following questions. Do not include your analysis or answers to your question hereonly describe how you will do the analysis. What is the variable of interest? What confidence interval will you use? What are your null and alternate hypotheses? Is it a one-sample or two-sample test? Is it an upper (right)-, lower (left)-, or two-tail test? What level of significance will you use and why? Are the conditions necessary for a confidence interval and hypothesis test for the population proportion satisfied? Explain. 3. (30 pts.) Carry out the methodology described in 2 above. Use StatCrunch and paste copies of the StatCrunch output in the space below. (NOTE: For the purposes of this assignment, go ahead and complete the confidence interval and hypothesis test even if there are not at least 10 successes and 10 failures.) Your explanation should include answers to the following questions: What are the upper and lower bounds of the confidence interval? What is the error term in the confidence interval? What does the confidence interval mean in terms of the question you posed? What is the p-value in your hypothesis test and what does it mean in terms of the question you posed? Did you reject or not reject the null hypothesis and why? What is the conclusion of your hypothesis test in terms of the question you posed? Do the results of the confidence interval and hypothesis test agree? Explain. 4. 4. (20 pts.) Based on the results of 2 and 3 above, answer your question. Include an explanation of how you used the StatCrunch output to formulate your answer. StatCrunch Assignment 3 Example In this assignment, you will use the StatCrunch U data set that you developed in Module 2 as part of the first StatCrunch assignment. As you did in StatCrunch Assignment 1B, look at the items in the StatCrunch U survey and develop a question regarding population proportions that can be answered using the survey data you collected. As a reminder, here are the items on the survey: Here are some possible questions. Feel free to pick one of those questionsbut it would be better for you to formulate a question of your own. For this assignment, you need a question related to a single proportion (since our sample data is drawn from a single population) and one that will require the use of a confidence interval and hypothesis test to answer. NOTE: There is a sample StatCrunch Assignment 3 designed to give you an idea of how to do this assignment. It uses a question related to the proportions of female and male students at StatCrunch U. You cannot use that question on this assignment. Is the proportion of female students at StatCrunch U different from the nationwide proportion of female students as reported by USA today? Is the proportion of seniors less than 25% of the student body? Is the proportion of students who work different from the nationwide proportion of students who work reported as 71% by a 2013 U.S. Census report? Is the proportion of students with student loans greater than the proportion of students without student loans? (This can be treated as a single proportion problem from the point of view that all students either do or do not have a student loan. Example answers follow the questions. Point values are indicated with each question. 1. (20 pts.) State your question. Remember that your question should be related to the population proportion or proportions and should be one that will require the use of a confidence interval and hypothesis test to answer. Assume that your sample is representative of the population. According to a USA Today article in 2010, 57 percent of U.S. college students are female (http://www.usatoday.com/news/education/2010-01-26-genderequity26_ST_N.htm). Is the proportion of female students at StatCrunch U greater than the nationwide proportion as reported by USA Today? 2. (30 pts.) Explain the methodology you will use to answer the question you posed. Your explanation should include answers to the following questions. Do not include your analysis or answers to your question hereonly describe how you will do the analysis. What is the variable of interest? The proportion of female students at StatCrunch U. What confidence interval will you use? I will use a 95% confidence interval for the population proportion. What are your null and alternate hypotheses? H0: p = .57 HA: p > .57 Is it a one-sample or two-sample test? This is a one-sample test for the population proportion. Is it an upper (right)-, lower (left)-, or two-tail test? Because the sign of the inequality in the alternate hypothesis is >, this will be a right or upper-tail test. What level of significance will you use and why? A .05 level of significance will be used. This is coincides with a 95% confidence interval and is appropriate based on the consequences of a Type I error. Are the conditions necessary for a confidence interval and hypothesis test for the population proportion satisfied? Explain. In my sample of size 30, there are 14 females and 16 males. The proportion of females in the sample is 14/30 = .467. The conditions are shown on pages 312 and 348. They are satisfied because: 1. Random Sample: The sample was randomly selected from the population of StatCrunch U students. 2. For the confidence interval, there should be at least 10 successes and 10 failures in the sample. In this case there are 14 females (success) and 16 males (failure) in the sample. For the hypothesis test there must be at least 10 expected successes and 10 expected failures in the sample if the null hypothesis is true. In this case n*p0 = 30*0.57 = 17.1 and n*(1 - p0) = 30*0.43 = 12.9, so the condition is satisfied. 3. Because sampling was done without replacement, the population must be at least 10 times larger than the sample. This is satisfied because the sample size is approximately 46,000 and the sample size is 30. 4. Items in the sample were independently selected. 5. I am assuming that the null hypothesis is true. 3. (30 pts.) Carry out the methodology described in 2 above. Use StatCrunch and paste copies of the StatCrunch output in the space below. (NOTE: For the purposes of this assignment, go ahead and complete the confidence interval and hypothesis test even if there are not at least 10 successes and 10 failures.) Your explanation should include answers to the following questions: What are the upper and lower bounds of the confidence interval? StatCrunch output for the confidence interval is shown below. The lower and upper bounds of the confidence interval are 0.228 and 0.645. What is the error term in the confidence interval? The error term in the confidence interval is (.6452 - .2881)/2 = .1785. What does the confidence interval mean in terms of the question you posed? I am 95% confident that the population proportion of female students at StatCrunch U is between .2881 and .6452. What is the p-value in your hypothesis test and what does it mean in terms of the question you posed? The StatCrunch output for the hypothesis test is shown below. The p-value is .8735. That means if the null hypothesis that the proportion of females in the population is .50 is true, then the probability of getting a sample proportion of females of .467 or more in a sample of size 30 is .8735. Did you reject or not reject the null hypothesis and why? The null hypothesis would not be rejected because the p-value of .8735 is greater than the level of significance of .05. What is the conclusion of your hypothesis test in terms of the question you posed? There is no evidence that the proportion of females at StatCrunch U is greater than 0.57. Do the results of the confidence interval and hypothesis test agree? Explain. The 95% confidence interval is .2881 to .6452 and we did not reject the null hypothesis that the proportion of females equals .57 at the .05 level of significance. The two agree because .57 is in the 95% confidence interval. Page 3 of 4 4. 4. (20 pts.) Based on the results of 2 and 3 above, answer your question. Include an explanation of how you used the StatCrunch output to formulate your answer. The question asked if the proportion of female students at StatCrunch U is greater than the nationwide proportion of .57 as reported by USA Today? The 95% confidence interval contains .57 and the null hypothesis that the proportion of female students at StatCrunch U equals .57 was not rejected. Therefore there is no evidence that the proportion of female students at StatCrunch U is greater than .57. ID Gender 1088 Male 1251 Female 8629 Male 10558 Male 11812 Female 12049 Female 16897 Male 17803 Female 19512 Female 22384 Female 24218 Male 24427 Female 24941 Male 26468 Female 29081 Female 30926 Male 31324 Female 31848 Male 33458 Female 33483 Female 34152 Female 37088 Female 37115 Female 37144 Male 37412 Female 38917 Female 39127 Male 40834 Female 43199 Male 43427 Male Class Hours 1 3 2 2 1 2 4 1 1 1 2 1 2 3 4 3 4 4 4 1 1 1 1 4 2 4 4 1 2 2 Work 12 15 6 8 12 12 16 15 15 13 15 15 11 8 17 15 15 14 18 17 20 15 15 13 15 13 16 15 14 11 Loans 0 14.5 36.5 28.5 20 0 0 0 0 0 0 17 25.5 29.5 0 0 0 0 0 0 0 0 0 21 23 0 0 0 17 0 CC Debt 0 1333 12674 3875 8467 0 0 6249 2813 1127 0 1650 0 9424 0 1394 3743 1204 0 1258 0 2979 0 1281 0 2592 12307 4042 0 6876 0 3459 0 5493 0 6661 14521 0 5325 922 3629 1082 0 1093 0 1037 15416 4665 0 2621 14899 7761 0 3746 5586 1283 8215 3082 0 3439