Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Location Income ($1,000) Urban 27 Rural 25 Suburban 25 Suburban 26 Rural 30 Urban 29 Rural 33 Urban 30 Suburban 32 Urban 34 Urban 35

Location Income ($1,000) Urban 27 Rural 25 Suburban 25 Suburban 26 Rural 30 Urban 29 Rural 33 Urban 30 Suburban 32 Urban 34 Urban 35 Urban 40 Rural 30 Rural 33 Urban 42 Suburban 32 Urban 43 Urban 43 Rural 33 Urban 47 Suburban 35 Urban 54 Suburban 42 Rural 36 Urban 57 Suburban 44 Rural 38 Urban 54 Urban 54 Suburban 46 Rural 40 Urban 60 Urban 58 Urban 61 Urban 61 Urban 62 Suburban 49 Urban 68 Suburban 57 Rural 45 Urban 71 Suburban 57 Suburban 64 Rural 45 Urban 74 Suburban 65 Rural 47 Rural 53 Suburban 66 Suburban 69 Size 1 4 1 1 5 1 6 1 2 1 1 1 6 6 2 2 2 2 7 2 3 2 3 7 3 3 7 3 3 4 7 4 4 5 5 6 5 6 6 8 7 7 8 8 7 8 8 8 8 8 Years 2 2 1 2 5 3 10 4 4 6 8 9 9 11 10 4 10 10 13 10 5 11 5 13 11 6 15 8 10 6 15 11 10 13 13 14 8 14 8 16 15 9 9 17 19 10 18 18 10 10 Credit Balance($) 2,631 2,047 3,155 3,913 2,660 3,531 2,766 3,769 4,082 3,806 4,049 4,073 2,697 2,914 4,073 4,310 4,199 4,253 3,104 4,293 4,456 4,340 4,925 3,178 4,391 4,947 3,203 4,354 4,366 5,003 3,250 4,402 4,397 4,595 4,786 4,888 5,148 5,011 5,220 3,257 5,528 5,283 5,332 3,304 5,553 5,484 3,342 3,788 5,756 5,861 MATH533 Course Project, Demonstrations of Parts A, B, C Part A Demo http://join.adobeconnect.com/p62r6l121rc/ Part B Demo http://join.adobeconnect.com/p53b3omnrg4/ Part C Demo http://join.adobeconnect.com/p11z4cpq1f0/ Course Project: AJ DAVIS DEPARTMENT STORES Introduction AJ DAVIS is a department store chain, which has many credit customers and wants to find out more information about these customers. A sample of 50 credit customers is selected with data collected on the following five variables. 1. Location (rural, urban, suburban) 2. Income (in $1,000'sbe careful with this) 3. Size (household size, meaning number of people living in the household) 4. Years (the number of years that the customer has lived in the current location) 5. Credit balance (the customers current credit card balance on the store's credit card, in $). The data is available in Doc Sharing Course Project Data Set as an Excel file. You are to copy and paste the data set into a minitab worksheet. PROJECT PART A: Exploratory Data Analysis Open the file MATH533 Project Consumer.xls from the Course Project Data Set folder in Doc Sharing. For each of the five variables, process, organize, present, and summarize the data. Analyze each variable by itself using graphical and numerical techniques of summarization. Use minitab as much as possible, explaining what the printout tells you. You may wish to use some of the following graphs: stem-leaf diagram, frequency or relative frequency table, histogram, boxplot, dotplot, pie chart, bar graph. Caution: Not all of these are appropriate for each of these variables, nor are they all necessary. More is not necessarily better. In addition, be sure to find the appropriate measures of central tendency and measures of dispersion for the above data. Where appropriate use the five number summary (the Min, Q1, Median, Q3, Max). Once again, use minitab as appropriate, and explain what the results mean. Analyze the connections or relationships between the variables. There are 10 pairings here (location and income, location and size, location and years, location and credit balance, income and size, income and years, income and balance, size and years, size and credit balance, years and Credit Balance). Use graphical as well as numerical summary measures. Explain what you see. Be sure to consider all 10 pairings. Some variables show clear relationships, while others do not. Prepare your report in Microsoft Word (or some other word processing package), integrating your graphs and tables with text explanations and interpretations. Be sure that you have graphical and numerical back up for your explanations and interpretations. Be selective in what you include in the report. I'm not looking for a 20-page report on every variable and every possible relationship (that's 15 things to do). Rather, what I want you do is to highlight what you see for three individual variables (no more than one graph for each, one or two measures of central tendency and variability (as appropriate), and two or three sentences of interpretation). For the 10 pairings, identify and report only on three of the pairings, again using graphical and numerical summary (as appropriate), with interpretations. Please note that at least one of your pairings must include location and at least one of your pairings must not include location. All DeVry University policies are in effect, including the plagiarism policy. Project Part A report is due by the end of Week 2. Project Part A is worth 100 total points. See grading rubric below. Submission: The report from Part 4, including all relevant graphs and numerical analysis along with interpretations Format for report: A. Brief introduction B. Discuss your first individual variable, using graphical, numerical summary, and interpretation C. Discuss your second individual variable, using graphical, numerical summary, and interpretation D. Discuss your third individual variable, using graphical, numerical summary, and interpretation E. Discuss your first pairing of variables, using graphical, numerical summary, and interpretation F. Discuss your second pairing of variables, using graphical, numerical summary, and interpretation G. Discuss your third pairing of variables, using graphical, numerical summary, and interpretation H. Conclusion Project Part A: Grading Rubric Category Points % Description Three Individual graphical analysis, numerical analysis (when Variables 36 36 appropriate) and interpretation 12 points each Three Relationships 45 45 graphical analysis, numerical analysis (when Category 15 points each Communication Skills Points % Total 100 19 Description appropriate), and interpretation writing, grammar, clarity, logic, cohesiveness, 19 adherence to the above format A quality paper will meet or exceed all of the 100 above requirements. Project Part B: Hypothesis Testing and Confidence Intervals Your manager has speculated the following. a. The average (mean) annual income was greater than $45,000. b. The true population proportion of customers who live in a suburban area is less than 45%. c. The average (mean) number of years lived in the current home is greater than 8 years. d. The average (mean) credit balance for rural customers is less than $3,200. 1. Using the sample data, perform the hypothesis test for each of the above situations in order to see if there is evidence to support your manager's belief in each case A-D. In each case, use the Seven Elements of a Test of Hypothesis in Section 6.2 of your text book with = .05, and explain your conclusion in simple terms. Also, be sure to compute the p-value and interpret. 2. Follow this up with computing 95% confidence intervals for each of the variables described in A-D, and again interpreting these intervals. 3. Write a report to your manager about the results, distilling down the results in a way that would be understandable to someone who does not know statistics. Clear explanations and interpretations are critical. 4. All DeVry University policies are in effect, including the plagiarism policy. 5. Project Part B report is due by the end of Week 6. 6. Project Part B is worth 100 total points. See the grading rubric below. Submission: The report from Part 3 and all of the relevant work done in the hypothesis testing (including minitab) in 1 and the confidence intervals (minitab) in Part 2 as an appendix Format for report: A. Summary report (about one paragraph on each of the speculations, A-D) B. Appendix with all of the steps in hypothesis testing (the format of the Seven Elements of a Test of Hypothesis, in Section 6.2 of your text book) for each speculation A-D, as well as the confidence intervals, including all minitab output Project Part B: Grading Rubric Category Points % Description Addressing each speculation hypothesis test, interpretation, 80 80 20 points each confidence interval, and interpretation one paragraph on each of the Summary report 20 20 speculations A quality paper will meet or exceed all Total 100 100 of the above requirements. Project Part C: Regression and Correlation Analysis Using MINITAB, perform the regression and correlation analysis for the data on income(Y), the dependent variable, and credit balance (X), the independent variable, by answering the following. 1. Generate a scatterplot for income ($1,000) versus credit balance($), including the graph of the best fit line. Interpret. 2. Determine the equation of the best fit line, which describes the relationship between income and credit balance. 3. Determine the coefficient of correlation. Interpret. 4. Determine the coefficient of determination. Interpret. 5. Test the utility of this regression model (use a two tail test with =.05). Interpret your results, including the p-value. 6. Based on your findings in 1-5, what is your opinion about using credit balance to predict income? Explain. 7. Compute the 95% confidence interval for beta-1 (the population slope). Interpret this interval. 8. Using an interval, estimate the average income for customers that have credit balance of $4,000. Interpret this interval. 9. Using an interval, predict the income for a customer that has a credit balance of $4,000. Interpret this interval. 10. What can we say about the income for a customer that has a credit balance of $10,000? Explain your answer. In an attempt to improve the model, we attempt to do a multiple regression model predicting income based on credit balance, years, and size. 11. Using MINITAB, run the multiple regression analysis using the variables credit balance, years, and size to predict income. State the equation for this multiple regression model. 12. Perform the global test foruUtility (F-Test). Explain your conclusion. 13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, state which independent variables should we keep and which should be discarded. 14. Is this multiple regression model better than the linear model that we generated in parts 1-10? Explain. All DeVry University policies are in effect, including the plagiarism policy. 15. Project Part C report is due by the end of Week 7. 16. Project Part C is worth 100 total points. See the grading rubric below. Summarize your results from 1-14 in a report that is 3 pages or less in length and explains and interprets the results in ways that are understandable to someone who does not know statistics. Submission: The summary report + all of the work done in 1-14 (Minitab Output + interpretations) as an appendix Format: A. Summary Report B. Points 1-14 addressed with appropriate output, graphs, and interpretations. Be sure to number each point 1-14. Project Part C: Grading Rubric Category Questions 1-12 and 14 5 points each Points % Description 65 65 addressed with appropriate output, graphs, and interpretations Question 13 15 15 Summary 20 Total 100 addressed with appropriate output, graphs, and interpretations 20 writing, grammar, clarity, logic, and cohesiveness A quality paper will meet or exceed all of the 100 above requirements. Project Part B: Hypothesis Testing and Confidence Intervals (SAMPLE) Your manager has speculated the following: a. the average (mean) annual income was greater than $45,000. Ans: Here the claim is that the average mean annual income was greater than $45,000. Let be the x population average mean annual income, be the sample mean average income and be the population standard deviation of income. Thus here we want to check, H0: 45against H1: > 45; units are taken in $1000. First of all note that the alternative is one sided thus it's a one tail test. And since the total sample size is 50 (and we know if the sample size is >30 we can consider it as a large sample using CLT thus can use a Z test instead of t test) so here we need to apply a Z test, more precisely one sample Z test for mean. The critical value for this test at 5% level of significance is, Z0.05 = 1.645;( as obtained from the table) so we will reject the null hypothesis if, the test statistic (Z) > 1.645. Now the test statistic can be computed by the following formula which gives, Z= x 45 n where n is the sample size. From the given data we got the MINITAB output as (assuming the population standard deviation 14.64) follows. The test statistic, Z = 0.49 with corresponding p-value 0.311. And the 95% Lower Bound = 42.61 Since the value of the test statistic is not in the critical region (i.e. value of the test statistic is not >critical value) so we are failing to reject the null hypothesis. Therefore based on the result we can conclude that the data gives enough evidence to conclude the null hypothesis is true i.e. based on the sample data we can say at 95% confidence or 5% significance the claim that \"the average mean annual income was greater than $45,000\" is false. We can also test this hypothesis using a p-value approach. Note that the p-value for this test is 0.311. Using p-value approach we reject the null hypothesis if the obtained p-value is smaller than the significance level and here as we can see the obtained p-value is greater than the significance level 0.05 thus we are failing to reject the null hypothesis. Now the 95% Lower Confidence Bound in this case is 42.61; this interval tells us that we can say that the actual population mean is greater than 42.61 with probability 0.95. b. the true population proportion of customers who live in a suburban area is less than 45%. Ans: Here also we use the Z test due to the same reason. Now let p be the actual (population) proportion of customers who live in suburban. So we are interested in testing, H0: p0.45 against H1: p<0.45. Here also it's a one sided alternative so the test is a one tail test. Thus the critical value of the test is, Z0.05 = 1.645; and here we will reject the null hypothesis if, the test statistic (Z) < -1.645. We have to remember that it's a one sample proportion test. Here the test statistic can be given as, Z= ^ 0.45 p ^ p (1 ^ ) p n ; where n is the sample size. ^ is the sample proportion of customers who live in suburban and p To calculate in Minitab we need to convert Location into a numerical value and after defining a new variable Sub (sub=1 if Location is Suburban, 0 elsewhere) we can work out in MINITAB. In that case the MINITAB output gives, The test statistic, Z = -2.13 with corresponding p-value 0.017. And the 95% Upper Bound = 0.406599 Since the value of the test statistic is in the critical region (i.e. value of the test statistic is not <1.645) so we are rejecting the null hypothesis. Therefore based on the result we can conclude that the data gives enough evidence to conclude the null hypothesis is false i.e. based on the sample data we can say at 95% confidence or 5% significance the claim that \"The true population proportion of customers who live in a suburban area is less than 45%\" is true. Again, here the p-value for this test is 0.017. Which is smaller than the significance level 0.05, thus we are rejecting the null hypothesis based on the p-value approach also. Now the 95% Upper Confidence Bound in this case is 0.406599; which tells us that with probability 0.95 we can say that the true population proportion of customers who live in a suburban area is less than 40.6599%. c. the average (mean) number of years lived in the current home is greater than 8 years. Ans: We want to test whether the average number of years lived in the current home is greater than 8 years or not. Thus we want to check, H0: 8 against H1: > 8; Similarly as before here we need to use one sample Z test for mean. The critical value for this test at 5% level of significance is, Z0.05 = 1.645; and we will reject the null hypothesis if, the test statistic (Z) > 1.645. Now the test statistic can be computed by the following formula which gives, Z= x 8 n From the given data we got the MINITAB output as (assuming the population standard deviation 4.4855) follows. The test statistic, Z = 2.52 with corresponding p-value 0.006. And the 95% Lower Bound = 8.557. Since the value of the test statistic is in the critical region (i.e. value of the test statistic is >1.645) so we are rejecting the null hypothesis. Therefore based on the result we can conclude that the data gives enough evidence to conclude the null hypothesis is false i.e. based on the sample data we can say at 95% confidence or 5% significance the claim that \"The average (mean) number of years lived in the current home is greater than 8 years\" is true. Again, here the p-value for this test is 0.006. Which is smaller than the significance level 0.05, thus we are rejecting the null hypothesis based on the p-value approach too. Now the 95% Lower Confidence Bound in this case is 8.557; which tells us that with probability 0.95 we can say that the average (mean) number of years lived in the current home greater than 8.557. d. the average (mean) credit balance for rural customers is less than $3200. Ans: Here we only need to consider the data from rural customers. Using the usual notations the hypothesis of interest in this case is, H0: 3200 against H1: <3200; Since the sample size =13 (<30) so we need to use a t-test for one sample mean. And the corresponding degrees of freedom associated is (13-1) = 12. The critical value for this test at 5% level of significance is, t12,0.05 = 1.7823; and we will reject the null hypothesis if, the test statistic (Z) < - 1.7823. Now the test statistic can be computed by the following formula which gives, Z= x 3200 S n From the given data we got the MINITAB output as follows. The test statistic, Z = -1.35 with corresponding p-value 0.1. And the 95% Upper Bound = 3251. Since the value of the test statistic is not in the critical region (i.e. value of the test statistic is not < - 1.7823) so we are failing to reject the null hypothesis. Therefore based on the result we can conclude that the data gives enough evidence to conclude the null hypothesis is true i.e. based on the sample data we can say at 95% confidence or 5% significance the claim that \"The average (mean) credit balance for rural customers is less than $3200\" is false. Again, here the p-value for this test is 0.1. Which is larger than the significance level 0.05, thus we are failing to reject the null hypothesis based on the p-value approach too. Now the 95% Upper Confidence Bound in this case is 3251; which tells us that with probability 0.95 we can say that the average (mean) credit balance for rural customers is less than 3251. Using the sample data, perform the hypothesis test for each of the above situations in order to see if there is evidence to support your manager's belief in each case a.-d. In each case use the Seven Elements of a Test of Hypothesis, in Section 6.2 of your text book with = . 05, and explain your conclusion in simple terms. Also be sure to compute the p-value and interpret. Follow this up with computing 95% confidence intervals for each of the variables described in a.-d., and again interpreting these intervals. Write a report to your manager about the results, distilling down the results in a way that would be understandable to someone who does not know statistics. Clear explanations and interpretations are critical. Project Part C: Regression and Correlation Analysis Using MINITAB perform the regression and correlation analysis for the data on CREDIT BALANCE (Y) and SIZE (X) by answering the following. 1. Generate a scatterplot for INCOME ($1000) vs. CREDIT BALANCE($), including the graph of the "best fit" line. Interpret. Ans: The required graph is give below. Scatter plot for I ncome($1000) vs Credit Balance 80 Income($1000) 70 60 50 40 30 20 2000 3000 4000 Credit Balance($) 5000 6000 Here each point in the scatter plot in representing the combination of Income and Credit Balance. As we can see as the income is increasing the credit balance is also increasing thus based on the scatter plot we can say there is positive relation presents in between the two variables Income and Credit Balance. So the expected correlation in between these two variables is positive. And we can also see the best fit line (linear regression line) is fitting the data really good. Thus based on the scatter plot we can say the customer having high income is expected to have high Credit Balance. 2. Determine the equation of the "best fit" line, which describes the relationship between INCOME and CREDIT BALANCE. The MINITAB output is given below, Regression Analysis: Income($1000) versus Credit Balance($) The regression equation is Income($1000) = - 3.52 + 0.0119 Credit Balance($) Predictor Constant Credit Balance($) S = 8.40667 Coef -3.516 0.011926 R-Sq = 64.1% SE Coef 5.483 0.001289 T -0.64 9.25 P 0.524 0.000 R-Sq(adj) = 63.3% Analysis of Variance Source Regression Residual Error Total DF 1 48 49 SS 6052.7 3392.3 9445.0 MS 6052.7 70.7 F 85.65 P 0.000 So, based on the output, the equation of the best fitted line is, Income = -3.516+ 0.011926*Credit Balance; where the unit of Credit balance is in $ and the unit of Income is in $1000. 3. Determine the coefficient of correlation. Interpret. The coefficient of correlation between the two variables is 0.801. The large positive value of correlation coefficient is telling us that there is a strong positive relation present in between the two considered variables. So if the value of one variable increases (decreases) the value of other variable will also increase (decrease) by almost same unit. 4. Determine the coefficient of determination. Interpret. The coefficient of determination is 0.641 or 64.1%. This value tells us about the strength of prediction of the dependent variable based on the value of the independent variable. The value 64.1% is implying that 64.1% of the variation of the dependent variable (Income) is explained by the regression model. The moderate value of coefficient of determination or R-sq is implying that the model is a medium fit. 5. Test the utility of this regression model (use a two tail test with =.05). Interpret your results, including the p-value. The utility of this model can be tested by a t-test for beta-1. From the obtained output we can see the test statistic for that test is 9.25 with corresponding p-value 0.000. Since the p-value is smaller than the significance level =.05 so we can say that the model is significant. 6. Based on your findings in 1-5, what is your opinion about using CREDIT BALANCE to predict INCOME? Explain. In 1-5 as we have seen that the model is significant which means the independent variable is significant in predicting the dependent variable. So using Credit Balance to predict Income is appropriate and Credit Balance is predicting Income really well. 7. Compute the 95% confidence interval for beta-1 (the population slope). Interpret this interval. The 95% confidence interval for 1 is (0.009335272, 0.01451756) this interval implies that it contains the true value of the parameter 1 with probability 0.95. 8. Using an interval, estimate the average income for customers that have credit balance of $4,000. Interpret this interval. The estimated interval is (41.77, 46.61), this interval tells us that based on the given data this interval contains the new estimated income for a customer, with probability 0.95, who has Credit Balance $4000. 9. Using an interval, predict the income for a customer that has a credit balance of $4,000. Interpret this interval. The predicted interval is (27.11, 61.27), this interval implies that based on the given data this interval contains the new prediction income for a customer having Credit Balance $4000 with probability 0.95. 10. What can we say about the income for a customer that has a credit balance of $10,000? Explain your answer. Putting the Credit balance value $10,000 in the regression model we get, Income = -3.516+ 0.011926*10,000 = 115.7482703. So a person having credit balance $10,000 is expected to have income $115,748.27 based on the fitted regression model. 11. In an attempt to improve the model, we attempt to do a multiple regression model predicting INCOME based on CREDIT BALANCE, YEARS and SIZE. Using MINITAB run the multiple regression analysis using the variables CREDIT BALANCE, YEARS and SIZE to predict INCOME. State the equation for this multiple regression model. The output in this case is given below, Regression Analysis: Income($1000) versus Credit Balance($), Size, Years The regression equation is Income($1000) = - 13.2 + 0.0108 Credit Balance($) + 0.615 Size + 1.21 Years Predictor Constant Credit Balance($) Size Years S = 5.26121 Coef -13.186 0.0107922 0.6151 1.2097 SE Coef 3.608 0.0008184 0.4178 0.2322 R-Sq = 86.5% T -3.65 13.19 1.47 5.21 P 0.001 0.000 0.148 0.000 R-Sq(adj) = 85.6% Analysis of Variance Source Regression Residual Error Total DF 3 46 49 Source Credit Balance($) Size Years SS 8171.7 1273.3 9445.0 DF 1 1 1 MS 2723.9 27.7 F 98.41 P 0.000 Seq SS 6052.7 1368.0 750.9 So the fitted regression line is, Income = -13.186 +0.0107922* Credit Balance + 0.6151* Size + 1.2097*Years. 12. Perform the Global Test for Utility (F-Test). Explain your conclusion. From the MINITAB output we can see the F-test statistic in this case is 98.41 with corresponding p-value 0. Thus the null hypothesis of insignificancy is rejected and we can conclude that the regression model is significant in predicting the dependent variable Income. 13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, which independent variables should we keep and which should be discarded. From the output, t-test statistic for Credit Balance is 13.19 with corresponding p-value 0, for Size it is 1.47 with p-value 0.148 and for Years it is 5.21 with p-value 0. Since the p-value for Credit balance and Years is smaller than 0.05 so they ae significant in predicting Income so we should keep them in the model but the p-value for size is greater than 0.05 implying Size is not so significant in predicting the Income thus we should remove this variable from the model. 14. Is this multiple regression model better than the linear model that we generated in parts 1-10? Explain. For that we need to look at the R-sq or coefficient of determination for both the models. As we can see the coefficient of determination for multiple regression model (86.5%) is greater than for the simple linear model (64.1%) so the MLR is explaining much higher variance thus implying the Multiple linear regression model is better than simple linear regression model. MATH 533: Applied Managerial Statistics Part C: Regression and Correlation Analysis Using MINITAB perform the regression and correlation analysis for the data on CREDIT BALANCE (Y) and SIZE (X) by answering the following. 1. Generate a scatterplot for CREDIT BALANCE vs. SIZE, including the graph of the "best fit" line. Interpret. Scatterplot of Credit Balance($) vs Size 6000 Credit Balance($) 5000 4000 3000 2000 1 2 3 4 Size 5 6 7 The scatter plot of Credit balance ($) versus Size show that the slope of the 'best fit' line is upward (positive); this indicates that Credit balance varies directly with Size. As Size increases, Credit Balance also increases vice versa. Correct MINITAB OUTPUT: Regression Analysis: Credit Balance($) versus Size The regression equation is Credit Balance($) = 2591 + 403 Size Predictor Constant Size S = 620.162 Coef 2591.4 403.22 SE Coef 195.1 50.95 T 13.29 7.91 R-Sq = 56.6% P 0.000 0.000 R-Sq(adj) = 55.7% Analysis of Variance Source DF SS MS F P Regression Residual Error Total 1 48 49 24092210 18460853 42553062 24092210 384601 62.64 0.000 Predicted Values for New Observations New Obs 1 Fit 4607.5 SE Fit 119.0 95% CI (4368.2, 4846.9) 95% PI (3337.9, 5877.2) Values of Predictors for New Observations New Obs 1 Size 5.00 2. Determine the equation of the "best fit" line, which describes the relationship between CREDIT BALANCE and SIZE. The equation of the \"best fit\" line help describes the relationship between Credit Balance and Size is Credit Balance ($) = 2591 + 403.2 Size Correct 3. Determine the coefficient of correlation. Interpret. The coefficient of correlation is given as r = 0.752. The correlation coefficients between the variables show a positive sign or direct relationship. The correlation coefficient is far from the P-Value of 0.000. In this case, a p-value of 0.000 is extremely low. This means that there is an extremely low chance that Credit Balance and Size results are due to chance. Correct MINITAB OUTPUT: Pearson correlation of Credit Balance ($) and Size = 0.752 P-Value = 0.000 4. Determine the coefficient of determination. Interpret. The coefficient of determination, R-Sq = 0.566. The proportion of variability in a dataset that is accounted for by the regression model is given by the coefficient of determination 2 R , which for this regression model is 56.6%. Correct MINITAB OUTPUT: S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7% 5. Test the utility of this regression model (use a two tail test with =.05). Interpret your results, including the p-value. The null hypothesis; Ho, states that there is no significant correlation, or the correlation coefficient =0. The Significance Level, = 0.05 Decision Rule: Reject Ho, if p-value < 0.05 From the Analysis of Variance table, I find that the p-value is 0.000, which is much less than 0.05. Therefore, I reject the null hypothesis because there is no significant correlation and conclude that, according to the overall test of significance, the regression model is valid. Correct MINITAB OUTPUT: Analysis of Variance Source Regression Residual Error Total DF 1 48 49 SS 24092210 18460853 42553062 MS 24092210 384601 F 62.64 P 0.000 6. Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT BALANCE? Explain. Base on my finding, I see that Size is a good predictor of Credit Balance because Credit Balance and Size seems to affect each other. As Size increase Credit Balance seems to increases also; they correlated. As the Size of the household grow so does the Credit Balance of those household also grew and increase. Correct 7. Compute the 95% confidence interval for . Interpret this interval. N/A 8. Using an interval, estimate the average credit balance for customers that have household size of 5. Interpret this interval. The household size of 5 average credit balances for customers is estimated to lie within the interval of (4368.2, 4846.9). This is the 95% confidence interval estimate for the credit balance for customers that have household size of 5. Correct MINITAB OUTPUT: Predicted Values for New Observations New Obs 1 Fit 4607.5 SE Fit 119.0 95% CI (4368.2, 4846.9) 95% PI (3337.9, 5877.2) Values of Predictors for New Observations New Obs 1 Size 5.00 9. Using an interval, predict the credit balance for a customer that has a household size of 5. Interpret this interval. The credit balance for a customer that has household size of 5 is expected to lie within the interval of (3337.9, 5877.2). This is the 95% prediction interval estimate for the credit balance for a customer that has household size of 5. Correct MINITAB OUTPUT: Predicted Values for New Observations New Obs 1 Fit 4607.5 SE Fit 119.0 95% CI (4368.2, 4846.9) 95% PI (3337.9, 5877.2) Values of Predictors for New Observations New Obs 1 Size 5.00 10. What can we say about the credit balance for a customer that has a household size of 10? Explain your answer. We cannot say anything about the credit balance for a customer that has a household size of 10 because since the maximum value of the predictor variable (size) used to formulate the given regression model is only 7, which is much less than 10; therefore, we cannot use the given regression model to accurately estimate the credit balance for a customer that has a household size of 10. Correct In an attempt to improve the model, we attempt to do a multiple regression model predicting CREDIT BALANCE based on INCOME, SIZE and YEARS. 11. Using MINITAB run the multiple regression analysis using the variables INCOME, SIZE and YEARS to predict CREDIT BALANCE. State the equation for this multiple regression model. MINITAB OUTPUT: Regression Analysis: Credit Balance($ versus Income ($1000), Size, Years The regression equation is Credit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years Predictor Constant Income ($1000) Size Years S = 424.715 Coef 1276.0 32.272 346.85 7.88 SE Coef 273.6 4.348 36.03 12.34 R-Sq = 80.5% T 4.66 7.42 9.63 0.64 P 0.000 0.000 0.000 0.526 R-Sq(adj) = 79.2% Analysis of Variance Source Regression Residual Error Total DF 3 46 49 SS 34255444 8297619 42553062 Source Income ($1000) Size Years DF 1 1 1 MS 11418481 180383 F 63.30 P 0.000 Seq SS 16703393 17478430 73620 Unusual Observations Obs 3 5 11 17 Income ($1000) 32.0 31.0 25.0 55.0 Credit Balance($) 5100.0 1864.0 4208.0 4412.0 Fit 3830.1 3001.7 3210.1 5250.3 SE Fit 93.7 139.3 103.3 116.3 Residual 1269.9 -1137.7 997.9 -838.3 St Resid 3.07R -2.84R 2.42R -2.05R R denotes an observation with a large standardized residual. The multiple regression equation is: Credit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years Correct 12. Perform the Global Test for Utility (F-Test). Explain your conclusion. The null hypothesis, Ho states that there is no significant correlation, or the correlation coefficient =0. Significance Level, = 0.05 Decision Rule: Reject Ho if p-value < 0.05 From the Analysis of Variance table, we find that the p-value (0.000) is much less than 0.05. Therefore, we reject the null hypothesis that there is no significant correlation and conclude that, according to the overall test of significance, the multiple regression models are valid. Correct MINITAB OUTPUT: Test for Equal Variances: Credit Balance($) versus Income ($1000) 95% Bonferroni confidence intervals for standard deviations Income ($1000) 21 22 23 25 26 27 29 30 31 32 33 34 35 37 39 40 41 42 44 46 48 50 51 52 54 55 61 N 2 2 1 1 1 2 1 3 1 1 1 1 1 2 2 1 1 1 1 1 2 2 1 1 3 4 1 Lower 267.855 188.069 * * * 101.215 * 123.736 * * * * * 328.265 276.062 * * * * * 80.471 259.193 * * 396.622 290.865 * StDev 830.85 583.36 * * * 313.96 * 309.43 * * * * * 1018.23 856.31 * * * * * 249.61 803.98 * * 991.86 647.76 * Upper 344720 242037 * * * 130260 * 7053 * * * * * 422465 355281 * * * * * 103563 333571 * * 22607 5780 * 62 63 64 65 66 67 2 1 1 1 2 2 221.807 * * * 87.765 70.212 688.01 * * * 272.24 217.79 285457 * * * 112951 90361 Bartlett's Test (Normal Distribution) Test statistic = 5.59, p-value = 0.935 Levene's Test (Any Continuous Distribution) Test statistic = 1.01, p-value = 0.479 Test for Equal Variances: Credit Balance($) versus Size 95% Bonferroni confidence intervals for standard deviations Size 1 2 3 4 5 6 7 N 5 15 8 9 5 5 3 Lower 137.540 459.836 193.542 415.251 340.696 360.277 150.085 StDev 271.807 698.998 336.323 701.689 673.284 711.981 356.267 Upper 1303.27 1337.23 943.85 1796.00 3228.28 3413.83 5956.16 Bartlett's Test (Normal Distribution) Test statistic = 8.07, p-value = 0.233 Levene's Test (Any Continuous Distribution) Test statistic = 1.12, p-value = 0.369 Test for Equal Variances: Credit Balance($) versus Years 95% Bonferroni confidence intervals for standard deviations Years 1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 N 2 1 2 2 1 2 1 2 2 4 4 4 5 3 4 2 5 2 2 Lower 541.930 * 452.950 130.788 * 78.920 * 76.013 135.483 204.115 348.641 167.957 584.321 232.333 231.705 111.114 452.721 121.398 540.589 StDev 1714.03 * 1432.60 413.66 * 249.61 * 240.42 428.51 461.26 787.86 379.55 1221.32 590.58 523.61 351.43 946.25 383.96 1709.78 Upper 875261 * 731550 211232 * 127462 * 122768 218815 4413 7538 3631 7236 14935 5010 179457 5607 196067 873094 Bartlett's Test (Normal Distribution) Test statistic = 13.77, p-value = 0.543 Levene's Test (Any Continuous Distribution) Test statistic = 2.23, p-value = 0.029 Conclusion is that since all the p-value of the Bartlett's Test (Normal Distribution) is greater than 0.05, I am unable to reject the null hypothesis. Levene's Test does not assume Normality and also fails to reject the null hypothesis of equal variance. 13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, which independent variables should we keep and which should be discarded. Test the significance for the individual coefficients of the independent variables. The null hypothesis, Ho states that there is no significant correlation, or the correlation coefficient p = 0. Decision Rule: Reject Ho if p-value <0.05 MINITAB OUT: Income ($1000) Analysis of Variance Source Regression Residual Error Total DF 1 48 49 SS 16703393 25849669 42553062 MS 16703393 538535 F 31.02 P 0.000 Year Analysis of Variance Source Regression Residual Error Total DF 1 48 49 SS 2878 42550184 42553062 MS 2878 886462 F 0.00 P 0.955 Size Analysis of Variance Source Regression Residual Error Total DF 1 48 49 SS 24092210 18460853 42553062 MS 24092210 384601 F 62.64 P 0.000 The independent variables of Income ($1000) and Size should kept because they have a significant contribution in the regression model, but variable Years should be discarded because it does not have a significant contribution in the regression model. Correct 14. Is this multiple regression model better than the linear model that we generated in parts 110? Explain. The proportion of variability in a dataset that is accounted for is given by the coefficient of determination r-square. Thus, the higher the value of r-square, the better is the regression model. The value of r-square is greater for the multiple regression model (0.805) as compared to that of the linear regression model (0.566) and hence the multiple regression model is better than the linear regression model. Correct Project Part C: Grading Rubric Category Questions 1 - 12 and 14 - 5 pts. each. Everyone gets credit for No. 7 Question 13 Summary Total Points Your Description Value Points addressed with appropriate 65 65 output, graphs and interpretations addressed with appropriate 15 15 output, graphs and interpretations writing, grammar, clarity, logic, 20 20 and cohesiveness A quality paper will meet or 100 100 exceed all of the above requirements. Project Part B a. The average (mean) annual income was greater than $45,000. Hypothesis: Null hypothesis, H0: the average mean annual income is $45000. that is 45 Alternative hypothesis H1: the average mean annual income is greater than $45000. That is > 45; units are taken in $1000 (one tailed hypothesis). Level of significance: Alpha = 0.05 Test statistic: Since, sample size is greater than 30, I use z test for mean. Z statistic is given as follows: z= xbaru sd ( n) From the given data we got the MINITAB output as (assuming the population standard deviation 14.64) follows. The test statistic, Z = 0.49 with corresponding p-value 0.311. And the 95% Lower Bound = 42.61 Decision rule: The critical value for this test at 5% level of significance is, Z0.05 = 1.645 If z is less than z(0.05), I fail to reject the null hypothesis. Else is z > z(0.05), I reject null hypothesis at 5% level of significance. Using p-value approach, if p-value is less than alpha (0.05), I reject null hypothesis at 5% level of significance. Else if p-value is greater than alpha (0.05), fail to reject the null hypothesis. Conclusion: Since, z < z(0.05), I ), I fail to reject the null hypothesis and conclude that the average mean annual income is $45,000. Using p-value approach, p-value is greater than alpha (0.05), hence I fail to reject the null hypothesis and conclude that the average mean annual income is $45,000. Now the 95% Lower Confidence bound in this case is 42.61; this interval tells us that I am 95% confident that the average mean annual income of population is greater than 42.61. b. The true population proportion of customers who live in a suburban area is less than 45%. Hypothesis: Null hypothesis, H0: the proportion of customers who live in suburban is 0.45. That is p0.45 Alternative hypothesis, h1: the proportion of customers who live in suburban is less than 0.45. That is p<0.45 (left tailed hypothesis) Level of significance: Alpha = 0.05 Test statistic: If n*po > 10 and n*(1 - po) 10 then use the following Z-test statistic z= Where p ^ 0.45 ^ (1 ^ ) p p n ^ p is the sample proportion of customers who live in suburban and n is the sample size? To calculate in Minitab we need to convert Location into a numerical value and after defining a new variable Sub (sub=1 if Location is Suburban, 0 elsewhere) we can work out in MINITAB. In that case the MINITAB output gives, Z = -2.13 with corresponding p-value 0.017. And the 95% Upper Bound = 0.406599 Decision rule: The critical value for this test at 5% level of significance is, -Z0.05 = -1.645 If z (negative) is less than -z(0.05), I reject the null hypothesis. Else is z (negative) > -z(0.05), I fail to reject null hypothesis at 5% level of significance. Using p-value approach, if p-value is less than alpha (0.05), I reject null hypothesis at 5% level of significance. Else if p-value is greater than alpha (0.05), fail to reject the null hypothesis. Conclusion: Since z (negative) is less than -z(0.05), I reject the null hypothesis and conclude that the proportion of customers who live in suburban is less than 0.45. That is p<0.45. Using p-value approach, p-value is less than alpha (0.05), hence I reject the null hypothesis and conclude the proportion of customers who live in suburban is less than 0.45. That is p<0.45. Now the 95% Lower Confidence bound in this case is 42.61; this interval tells us that I am 95% confident that the average mean annual income of population is greater than 42.61. The 95% Upper Confidence Bound in this case is 0.406599; which tells us that I am 95% confident that the true population proportion of customers who live in a suburban area is less than 40.6599%. c. The average (mean) number of years lived in the current home is greater than 8 years. Hypothesis: Null hypothesis, H0: average (mean) number of years lived in the current home is greater than 8 years. That is = 8 Alternative hypothesis H1: average (mean) number of years lived in the current home is greater than 8 years. That is > 8 (one tailed hypothesis) Level of significance: Alpha = 0.05 Test statistic: Since, sample size is greater than 30, I use z test for mean. Z statistic is given as follows: z= z= xbaru sd ( n) x 8 n From the given data we got the MINITAB output as (assuming the population standard deviation 4.4855) follows. Z = 2.52 with corresponding p-value 0.006. And the 95% Lower Bound = 8.557. Decision rule: The critical value for this test at 5% level of significance is, Z0.05 = 1.645 If z is less than z(0.05), I fail to reject the null hypothesis. Else is z > z(0.05), I reject null hypothesis at 5% level of significance. Using p-value approach, if p-value is less than alpha (0.05), I reject null hypothesis at 5% level of significance. Else if p-value is greater than alpha (0.05), fail to reject the null hypothesis. Conclusion: Since, z > z(0.05), I reject the null hypothesis and conclude that average (mean) number of years lived in the current home is greater than 8 years. That is > 8 Using p-value approach, p-value is less than alpha (0.05), hence I reject the null hypothesis and conclude that average (mean) number of years lived in the current home is greater than 8 years. That is > 8. The 95% Lower Confidence Bound in this case is 8.557; which tells us that I am 95% confident that the average population with number of years lived in the current home greater than 8.557. d. The average (mean) credit balance for rural customers is less than $3200. Hypothesis: Null hypothesis, H0: average (mean) credit balance for rural customers is equal to $3200. That is = 3200 Alternative hypothesis H1: average (mean) credit balance for rural customers is less than $3200. That is < 3200 (left tailed hypothesis) Level of significance: Alpha = 0.05 Test statistic: Since, sample size is less than 30, I use t test for mean. T statistic is given as follows: xbar u t= sd ( n) t= x 3200 n From the given data we got the MINITAB output as follows. t = -1.35 with corresponding p-value 0.1. And the 95% Upper Bound = 3251. Decision rule: Df = n-1 = 13 - 1 = 12 The critical value for this test at 5% level of significance is, t12,0.05 = - 1.7823 If t (negative) is less than -t(0.05), I reject the null hypothesis. Else is t (negative) > -t(0.05), I fail to reject null hypothesis at 5% level of significance. Using p-value approach, if p-value is less than alpha (0.05), I reject null hypothesis at 5% level of significance. Else if p-value is greater than alpha (0.05), fail to reject the null hypothesis. Conclusion: Since, t (negative) > -t(0.05), I fail to reject the null hypothesis and conclude that average (mean) credit balance for rural customers is equal to $3200. That is = 3200. Using p-value approach, p-value is greater than alpha (0.05), hence I fail to reject the null hypothesis and conclude that average (mean) credit balance for rural customers is equal to $3200. That is = 3200 the 95% Upper Confidence Bound in this case is 3251; which tells us that I am 95% confident that the average (mean) population of rural customers with credit balance is less than 3251. Project Part C Linear Regression Model The output of linear regression as obtained in Minitab is shown below. Regression Analysis: Income ($1,000) versus Credit Balance($) The regression equation is Income ($1,000) = - 3.52 + 0.0119 Credit Balance($) Predicto r Constan t Coef Credit Balance ($) SE Coef T P -3.516 0.01192 6 5.483 0.00128 9 -0.64 0.524 9.25 0.000 S = 8.40667 R-Sq = 64.1% R-Sq(adj) = 63.3% Analysis of Variance Source Regression Residual Error Total DF 1 48 49 SS 6052.7 3392.3 9445.0 MS 6052.7 70.7 F 85.65 P 0 Unusual Observations Obs 2 4 Credit Balance ($) 2047 3913 Income {$1000} 25.000 26 Fit SE Fit Residua l 20.9 43.15 2.96 1.23 4.1 -17.15 St Residual 0.52 X -2.06 R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Scatterplot for income ($1,000) versus credit balance($) The scatter plot between Income and credit balance is shown below. Scatterplot of Income ($1,000) vs Credit Balance($) 80 Fits Linear Linear 70 Incom ($1,000) e 1. 60 50 40 30 20 2000 3000 4000 Credit Balance($) 5000 6000 As is evident from the scatter plot, there is a clear and definite relationship between the two variables. The variables Income and Credit Balance exhibit a strong linear positive relationship or correlation. If Credit balance increases, the income also increases. 2. Equation of the best fit line The equation of best fit line as obtained from Regression analysis is shown below. Income ($1,000) = - 3.52 + 0.0119 Credit Balance ($) Here, value of intercept (-3.52) implies the initial value of income, when credit balance is $ 0. The value of slope implies that with a unit increase in credit balance, there is 0.0119 units ($ 1,000) increase in income. 3. Coefficient of correlation Coefficient of correlation = sqrt (coefficient of determination) = sqrt (0.641) = 0.800625 This implies that the variables Income and Credit Balance exhibit a strong linear positive relationship or correlation. If Credit balance increases, the income also increases. 4. Coefficient of determination Coefficient of determination = 64.1%, that is 64.1% variation in the dependent variable which is income is explained by the independent variable which is credit balance. 5. Utility of this regression model Null Hypothesis, Ho: Model is not significant Alternative Hypothesis, H1: Model is significant From the table of analysis of variance in regression analysis, p-value = 0.000. Since, p-value is less than alpha (0.05), I reject Ho at 5% level of significance and conclude that Model is significant. 6. Using credit balance to predict income Since the model is significant and 64.1% variation in the income is explained by credit balance, I can say that credit balance can predict income. 7. 95% confidence interval for beta-1 Confidence interval for beta-1 is given by beta1 t(a/2,n-k-1)*SEbeta1 = 0.011926 2.0106*0.001289 = 0.011926 0.002592 = (0.009334, 0.014518) This implies that I am 95% confident that estimated value of coefficient beta1 lies in this interval. 8. Interval estimate the average income for credit balance of $4,000 Predicted Values for New Observations New Obs Fit SE Fit 1 44.19 1.21 95% CI (41.77, 46.61) 95% PI (27.11, 61.27) Values of Predictors for New Observations New Obs Credit Balance ($) 1 4000 Confidence interval for the average income of a customer that has a credit balance of $4,000 is (41.77, 46.61). This implies that I am 95% confident that average income of a customer who has credit balance of $4,000 lies in this interval. 9. Interval prediction of the income for credit balance of $4,000 Predicted Values for New Observations New Obs Fit SE Fit 1 44.19 1.21 95% CI (41.77, 46.61) 95% PI (27.11, 61.27) Values of Predictors for New Observations New Obs Credit Balance ($) 1 4000 Prediction interval for the income for a customer that has a credit balance of $4,000 is (27.11, 61.27). This implies that I am 95% confident that predicted income for a customer who has credit balance of $4,000 lies in this interval. 10. What can we say about the income for a customer that has a credit balance of $10,000? Explain your answer. Predicted Values for New Observations New Obs Fit SE Fit 95% CI 1 115.75 7.63 (100.41, 131.08) 95% PI (92.92, 138.57) XX XX denotes a point that is an extreme outlier in the predictors. Values of Predictors for New Observations New Obs Credit Balance ($) 1 10000 This implies that for a customer that has a credit balance of $10,000, the income is 115.75 ($1000) which is $115750. Multiple Regression Model In an attempt to improve the model, we attempt to do a multiple regression model predicting income based on credit balance, years, and size. The output of multiple regression as obtained in Minitab is shown below. Regression Analysis: Income ($1,0 versus Credit Balance, Years, Size The regression equation is Income ($1,000) = - 13.2 + 0.0108 Credit Balance($) + 1.21 Years + 0.615 Size Predicto r Constan t Credit Balance ($) Years Size Coef SE Coef T P -13.186 0.00792 2 1.2097 0.6151 3.608 0.00081 8 0.2322 0.4178 -3.65 0.001 13.19 5.21 1.47 0.000 0.000 0.148 S = 5.26121 R-Sq = 86.5% R-Sq(adj) = 85.6% Analysis of Variance Source Regression Residual Error Total Source Credit Balance ($) Years Size DF 3 46 49 DF 1 1 1 SS 8171.7 1273.3 9445.0 Seq SS 6052.7 2059.0 60.0 MS 2723.9 27.7 F 98.41 P 0 Unusual Observations Credit Obs 2 Balance ($) 2047 Income {$1000 } 25.000 Fit SE Fit Residua l 133.786 2.415 11.214 St Residual 2.40R R denotes an observation with a large standardized residual. 11. Equation for multiple regression model The regression equation is Income ($1,000) = - 13.2 + 0.0108 Credit Balance($) + 1.21 Years + 0.615 Size Here, value of intercept (-13.2) implies the initial value of income, when credit balance is $ 0. The value of slope implies that with a unit increase in credit balance, there is 0.0108 units ($ 1,000) increase in income. The value of slope implies that with a unit increase in years, there is 1.21 units ($ 1,000) increase in income. The value of slope implies that with a unit increase in Size, there is 0.615 units ($ 1,000) increase in income. 12. Global test for Utility (F-Test) Null Hypothesis, Ho: Model is not significant Alternative Hypothesis, H1: Model is significant From the table of analysis of variance in regression analysis, p-value = 0.000. Since, p-value is less than alpha (0.05), I reject Ho at 5% level of significance and conclude that Model is significant. 13. t-test on independent variables For variable credit balance, Null hypothesis, Ho1: beta1 = 0, beta1 is not significant versus alternative hypothesis H11: beta1 =/= 0, beta1 is significant. Since, p-value = 0.000 is less than alpha (0.05), I reject Ho1 at 5% level of significance and conclude that beta 1 is significant. Hence credit balance should be included in the model. For variable years, Null hypothesis, Ho2: beta2 = 0, beta2 is not significant versus alternative hypothesis H12: beta2 =/= 0, beta2 is significant. Since, p-value = 0.000 is less than alpha (0.05), I reject Ho2 at 5% level of significance and conclude that beta2 is significant. Hence years should be included in the model. For variable size, Null hypothesis, Ho3: beta3 = 0, beta3 is not significant versus alternative hypothesis H13: beta3 =/= 0, beta3 is significant. Since, p-value = 0.148 is greater than alpha (0.05), I fail to reject Ho3 at 5% level of significance and conclude that beta3 is not significant. Hence size should not be included in the model. 14. Multiple regression model versus linear model Both the models are significant. But for linear model, R 2 = 64.1% and for multiple regression model R2 = 86.5%. This implies that variation in income is better explained by Multiple Regression Model by its independent variables considered as compared to linear regression model. Hence Multiple Regression Model is better than Linear Regression Model. But coefficient of size is not significant and hence size should not be included in the model. Hence Multiple Regression Model considering only Credit Balance and Years as independent variables should be best appropriate. Statistics Study Guide 2 Hours and 45 Minuets Some of the key study areas are shown below. Although these are key areas, remember that the exam is comprehensive for all of the assigned course content and this study guide may not be all-inclusive. TCO A o o o o o o TCO B o o o Hypothesis Tests: For the Population Proportion Hypothesis Tests: For the Population Mean TCO E o o Confidence Intervals/Sample Sizes: For the Population Mean Confidence Intervals/Sample Sizes: For the Population Proportion TCO D o o Contingency Tables The Binomial Distribution The Normal Distribution TCO C o o Descriptive Statistics and Exploratory Data Analysis Central Tendency Dispersion The Shape of the Distribution Numbers, Graphs, and Tables One Variable and Two Variables Simple Linear Regression Multiple Regression TCO F o Use of Minitab in each of the above TCOs A Given a managerial problem and accompanying data set, construct graphs (following principles of ethical data presentation), calculate and interpret numerical summaries appropriate for the situation. Use the graphs and numerical summaries as aids in determining a course of action relative to the problem at hand. The following Table provides an overview of the content of the 10 Questions and the possible points for each Question. # 1 2 3 4 5 6 7 8 9 1 0 Content Descriptive Statistics Contingency Table Binomial Distribution Normal Distribution Mean: Confidence Interval/Sample Size Proportion: Confidence Interval/Sample Size Proportion: Hypothesis Test Mean: Hypothesis Test Simple Linear Regression Multiple Regression TCO A Points 33 Minitab and Calculator Minitab Required B B 18 18 Calculator Only Minitab Required B C 18 18 Minitab Required Minitab Required And Calculator C 18 Minitab Required And Calculator D 24 Minitab Required D 24 Minitab Required E 48 E 31 Minitab Output Provided Minitab Output Provided Notes What follows is a Sample with 10 representative problems. Actual problems will be similar, but will vary in the scenario presented, the numbers used, and the particular variation of the problem type presented. Demo Question 1: TCO A and F The following raw data is the result of selecting a random sample of new Ford Escorts and testing these cars for fuel efficiency (results are in miles per gallon): 40.2 32.9 29.2 26.5 33.3 28.4 28.8 34.2 30.3 35.7 25.8 37.5 35.6 40.8 33.3 26.6 33.5 28.1 Use Minitab to run the key Descriptive Statistics, i.e., mean, median, mode, standard deviation, variance, quartiles, interquartile range, and range. Interpret your Minitab output, including selected measures of central tendency, dispersion, and the shape of the distribution. Answer: Descriptive Statistics: CarsFuel Variable CarsFuel Total Count 18 Mean 32.26 Variable CarsFuel Range 15.00 IQR 7.30 StDev 4.59 Mode 33.3 Variance 21.10 N for Mode 2 Minimum 25.80 Q1 28.32 Median 33.10 Q3 35.63 Maximum 40.80 Note: The Question may ask for only a selection of the Descriptive Statistics, along with their Interpretation. Question 2: TCO B The following table gives the number of claims at a large insurance company by kind and geographical region. East South Midwest West Totals Hospitalizatio n 55 328 29 52 464 Physician's Visit 233 514 204 251 1202 Outpatient Treatment 100 526 65 102 793 Totals 388 1368 298 405 2459 If a bill is chosen at random, what is the probability that it is either from the East or from the West? B. If a bill is chosen at random, what is the probability that it is not for Hospitalization and from the South. C. Given that the bill is from the Midwest, what is the probability that it is for a Physician's Visit? D. Given that the bill is not from either the East or the West, what is the probability that it is not for a Physician's Visit? A. Answers: A. (# East + #West)/Total = (388 + 405)/2459 = 0.32249 B. # Not Hospitalization and Not South/Total = 2131/2459 = 0.866612 C. (#from Midwest AND Physician's Visit)/# from Midwest = 204/298 = 0.68456 D. #Not Physician's Visit/Given #Not from East or West = 948/1666 = 0.56903 Question 3: TCO B and F The probability of obtaining a home equity loan from United Bank is 0.40. A random sample of 10 loan applications is selected. Use Minitab to find the following probabilities. A. Find the probability that at least five will get a loan. B. Find the probability that none will get a loan. C. Find the probability that less than three will get a loan. D. Find the probability that three or less, or more than seven will get a loan. Answers: A. P(X>or =5) = 0.3669 B. P(X=0) = 0.006 C. P(X<3) = P(Xor = 8) = 0.38230 + 0.01229 = 0.39459 Question 4: TCO B and F The length of time to do a complete full battery check and replacement (if needed) at Insta-Bat is normally distributed with a mean of 16.2 minutes and a standard deviation of 2.4 minutes. Use Minitab to find the required probabilities. A. Determine what percentage of complete full battery check and replacements (if needed) will fall between 10 and 20 minutes. B. What percentage of complete full battery check and replacements (if needed) will take less than 20 minutes? C. Using your answer from B., If 1000 cars had a complete full battery check and replacement (if needed), how many would you expect to be finished in less than 20 minutes? D. What percentage would take less than 10 minutes? E. Management wants to offer customers a guarantee that a complete full battery check and replacement (if needed) can be completed within a certain amount of time. What number of minutes should they include in their guarantee, to be sure they are achieving their goal 99% of the time? F. What number of minutes should they include in their guarantee, to be sure they are achieving their goal 95% of the time? Answers: A. P(10 X=21.78 22 F. For P=0.95, X=? -> X=20.15 21 Question 5: (TCO C) Unoccupied seats on flights cause airlines to lose revenue. Suppose a large airline wants to estimate its average number of unoccupied seats per flight over the past year. To accomplish this, the records of 225 flights are randomly selected, and the number of unoccupied seats is noted for each of the sampled flights. The results are the following. Sample Size = 225 Sample Mean = 11.5956 Sample Standard Deviation = 4.1026 A. Compute the 95% confidence interval for the mean number of unoccupied seats. B. Interpret the 95% confidence interval. C. How many flights should be tested if the airline wants to be 95% confident of being within 1 seat of the population mean number of unoccupied seats? D. How many flights should be tested if the airline wants to be 99% confident of being within 2 seats of the population mean number of unoccupied seats? Answers: A. Minitab Output: One-Sample Z The assumed standard deviation = 4.1026 N 225 Mean 11.596 SE Mean 0.274 95% CI (11.060, 12.132) Therefore, the 95% confidence interval is (11.060, 12.132). B.Interpretation: We are 95% confident the population mean for the number of unoccupied seats is between 11.060 and 12.132. C.Sample Size: n=((1.96*4.1026)/1)**2 = 65 D. Sample Size: n=((2.575*4.1026)/2)**2 = 28 Question 6: (TCO C) A food-products company conducted a market study by randomly sampling and interviewing 1,000 consumers to determine which brand of breakfast cereal they prefer. Suppose 313 consumers were found to prefer the company's brand. Estimate the true fraction of all consumers who prefer the company's cereal brand? A. Compute the 95% confidence interval for the percent of consumers who prefer the company's brand of breakfast cereal. B. Interpret this confidence interval. C. How large a sample size will need to be selected if we wish to have a 95% confidence interval that is accurate to within 1.5%. (Assume p = .35) D. How large will the sample size need to be if we wish to be accurate to within 2.0%, with 95% confidence? Answers: A.Minitab Ouput Test and CI for One Proportion Sample 1 X 313 N 1000 Sample p 0.313000 95% CI (0.284259, 0.341741) Using the normal approximation. B. Interpretation: We are 95% confident the population proportion of consumers who prefer the company's brand of breakfast cereal is between 0.284 and 0.342. C.Sample Size: n= p*q*(z/E)^2 n= 0.35*0.65*(1.96/0.015)^2 n= 3885 D. Sample Size: n= p*q*(z/E)^2 n= 0.35*0.65*(1.96/0.02)^2 n= 498 Question 7: (TCO D) Historically, 21% of professional tennis players use Wilson tennis balls. A randomly selected sample of 50 professional tennis players found that eight use Wilson tennis balls. Does the sample data provide evidence to conclude that the percentage of professional tennis players using Wilson tennis balls is less than 21% (with = .10)? Use the hypothesis testing procedure, including the Hypotheses, all steps for both the p-value and rejection region approaches, and your Interpretation, using the variable and its units. Answer: The hypotheses include Ho: p >or=.21 and Ha: p<.21. For the given alpha=0.10, the critical value = -1.282, and the rejection region 50. For the given alpha=0.02, the critical value =2.054, and the rejection region >or= 2.054. The test statistic z=5.75 and the observed pvalue=0.000. Because the test statistic is in the Rejection region, and the p-value is less that alpha, we can reject the null hypothesis, that the life of the jogging shoes is 50 hours, at alpha = .02. That is, we are at least 98% sure that the average life of shoes exceeds 50 hours. Question 9: Use Minitab and the data below to run the Linear Regression, and use the Minitab output to answer the following questions. You can Copy and Paste the data set into Excel, and then Copy and Paste it into Minitab. NOTE: ON THE FINAL EXAM YOU WILL BE PROVIDED THE MINITAB OUTPUT. Suppose an appliance store conducts a 5-month experiment to determine the effect of advertising on sales revenue. Below is the output, including the scatterplot and regression output. Answer the questions below. (Data set: ADSALES) Advexp X ($100s) 1 2 3 4 5 Sales Y ($1000s) 1 1 2 2 4 a. Analyze the Minitab output to determine the regression equation. b. Find and interpret ^ 1 hat in the context of this problem. c. Find and interpret the coefficient of determination (r-squared). d. Find and interpret the coefficient of correlation. e. Does the data provide significant evidence (= .05) that the advertising can be used to predict the sales? Test the utility of this model using a two-tailed test. Find the observed p-value and

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Algebra 1

Authors: Mary P. Dolciani, Richard A. Swanson

(McDougal Littell High School Math)

9780395535899, 0395535891

More Books

Students also viewed these Mathematics questions