Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

ECON 2202 - Statistical Methods in Economics and Business II EXCEL ASSIGNMENT WINTER 2016 Due: by END OF CLASS (11:30) Thursday April 7, 2016 NO

ECON 2202 - Statistical Methods in Economics and Business II EXCEL ASSIGNMENT WINTER 2016 Due: by END OF CLASS (11:30) Thursday April 7, 2016 NO LATE ASSIGNMENTS ARE ACCEPTED. IT MUST BE SUBMITTED TO COMPLETE THE COURSE REQUIREMENTS READ ALL INSTRUCTIONS BEFORE BEGINNING!!!!!!!! 1. Late assignments receive a grade of 0. Failure to submit this assignment leads to an FND in the course. 2. You must use EXCEL in the questions where indicated for this assignment. Copy and paste the tables into EXCEL from the text below. You must hand in all EXCEL output where indicated. . 3. The assignment otherwise is hand written. Please label the Excel outputs with the question numbers. 4. Students may work together on the assignment, but each student MUST submit the assignment independently. 5. Keep a copy of this assignment to study from, solutions will be posted. 6. All hypothesis tests MUST use the five steps. Formulae are necessary in all cases unless otherwise noted. For this assignment ONLY you can follow these examples to keep the document smaller and save time!! The conclusions MUST be complete IN CONTEXT as always. F-test: 1. Ho: regression is not significant, Ha: regression is significant 2. =0.05, one-tailed. F,(k, n-k-1) = F0.05,(2,7) = 6.542 3. Reject Ho if F0 >F 4. F = MSR/MSE = 43.3648 5. Since F0 > F (43.3648> 6.542), reject Ho. The regression equation is significant; the relationship between apples and oranges is significant. OR if using p-values P-value approach 1. Ho: regression is not significant, Ha: regression is significant 2. =0.05, one-tailed 3. Reject Ho if p-value < = 0.05 4. p-value = 0.0001 5. Since p-value = 0.0001 < = 0.05, reject Ho. The regression equation is significant; the relationship between apples and oranges is significant. 1 1. (16 +3 for Excel work) A company is reviewing the performance of two different suppliers of toner ink as their intention is to compare them and possible switch to the alternate one. Using the following data for hours of toner use before empty and alpha = 0.05 in all cases answer the following. CURRENT SUPPLIER A 486 ALTERNATE SUPPLIER B 489 490 491 491 494 494 496 498 498 489 491 492 492 492 492 493 493 498 502 504 505 506 507 508 510 514 515 527 494 495 496 497 497 497 498 499 502 503 505 a. (5) Determine if the standard deviation of the Alternate Supplier B is significantly less than 5. Use alpha = 0.05. (No EXCEL) b. (3) Determine if the variation of the hours of toner use of Current Supplier A is greater than that of Alternate Supplier B. Use EXCEL, include printout of result and DO NOT USE p-values in test write up. c. (3) Determine if there is any difference in the variation of the hours of toner use between the two suppliers. Use EXCEL, include printout of result and DO NOT USE p-values in test write up. d. (3) Using the findings of part (c), test to determine if Current Supplier A has higher average hours of toner use than Alternate Supplier B. Use EXCEL, include printout of result and use pvalues in test. e. (2) What is your recommendation to the company with respect to the supplier to select? Explain. 2. (3+1 for Excel work) Public health in a local hospital decided to test whether the length of time that nurses wash their hands will alter the amount of bacteria remaining on them. To ensure 2 consistency in the testing nine nurses were selected and each washed their hands, in the first trial for 2.5 seconds and in the second for 15 seconds. The amount of remaining bacteria was measured in each case. Test the hypothesis that longer washing leads to less bacteria remaining. Use alpha = 0.05. CULTURE 1 66 132 120 187 190 17 33 92 1000 CULTURE 2 78 115 93 48 77 3 12 12 146 3 3. (15+1 for Excel work) Grading assignments is a real problem, taking a great deal of time and while many students do a poor job, many copy and gain no benefit. An instructor who taught three sections of a course carried out a test. One section had no assignments, one had assignments but they were not graded, and the final section had graded assignments. Grades were reviewed after the first midterm. Assume each population is normally distributed, and samples are independent random samples. Using EXCEL where possible perform ALL ANOVA tests necessary to determine which approach if any led to higher grades. Use alpha = 0.05 in all tests. Use EXCEL for the ANOVA test and use the p values in testing. For those tests for which you cannot use EXCEL, use the standard write up as used in the practice assignments. No Assignments 69 69 92 84 79 84 76 63 76 82 89 72 72 65 73 47 92 71 83 81 92 80 64 72 84 79 74 81 Assignments, Not Graded 73 63 68 79 57 68 72 74 49 84 79 71 80 74 71 63 88 83 89 82 69 92 79 81 76 81 81 75 Assignments, Graded 83 97 72 79 84 76 91 76 83 88 91 96 68 99 89 80 79 91 83 83 76 90 79 67 86 86 82 84 4 4. (9+1 for Excel work) The instructor in the previous example teaches the same sections of the course four times a year and wanted to investigate if there was not only a difference in the way assignments were treated but also in the time of year the course was given. The following data was collected. Assume all populations are normally distributed, observations are independent, and population variances are equal. Using EXCEL perform all remaining tests necessary to determine whether the grades were influenced by assignment method and/or semester. Use alpha = 0.05 in all tests. No post tests are required. TERM FALL WINTER EARLY SUMMER LATE SUMMER No Assignments 69 69 92 84 79 84 76 63 76 82 89 72 72 65 73 47 92 71 83 81 92 80 64 72 84 79 74 81 Assignments, Not Collected 73 63 68 79 57 68 72 74 49 84 79 71 80 74 71 63 88 83 89 82 69 92 79 81 76 81 81 75 Assignments, Collected 83 97 72 79 84 76 91 76 83 88 91 96 68 99 89 80 79 91 83 83 76 90 79 67 86 86 82 84 5 5. (16) Is the type of beverage ordered with lunch at a restaurant independent of the age of the customer? A random sample of 309 lunch customers was taken and the results are as follows. Using alpha = 0.01, test to determine if the two factors are related. Please show intermediate calculation steps for part marks. (No Excel in this question.) AGE 21-34 35-55 OVER 55 COFFEE/TE A 26 41 24 91 SOFT DRINK 95 40 13 148 OTHER 18 20 32 70 139 101 69 309 6. (13 +5 for Excel work) Can sales of major fast food chains be explained by the number of individual locations the chain has? Consider the following data where sales are measured in $ billions and locations are in thousands. MCDONALD'S BURGER KING TACO BELL PIZZA HUT WENDY'S KFC SUBWAY DAIRY QUEEN A&W SALES 17.1 7.9 4.8 4.7 4.6 4 2.9 2.7 2.7 LOCATION S 12.4 7.5 6.8 8.7 4.6 5.1 11.2 5.1 2.9 Use EXCEL, to create a scatter diagram, correlation matrix, and regression model. Conduct all tests at the 5% significance level. a) (2) Reviewing the data and the scatter diagram, does the correlation matrix result support the relationship you would expect? Are there any other considerations or concerns that arise from this? b) (5) Conduct a test to determine whether the population correlation coefficient is significantly different from zero. Do not use p values. c) (4) Is there an observation which is impacting the relationship? Given your findings in parts (a) and (b), explain what you might consider doing to further investigate this relationship? Rerun the correlation matrix and the equation and compare the two models with respect to the correlation of the variables, the fit (R2), and the significance of the slope. You do not need to write out the tests just use the p values to explain what happens and why when comparing the significance of the model or slope. 6 d) (2) Which model would you select to examine the relationship? Explain your choice. 7. (15+2 for Excel work) Consider the following data for the average cost of various fuels and electricity for a twelve year period. The data measure the following: Electricity Natural Gas Fuel Oil Gasoline Residential rate per kilowatt hour Residential natural gas per 1000 cubic feet Residential fuel oil per gallon Regular gasoline per gallon Your model will attempt to predict residential electricity costs using the cost of the other fuels. a) (1) State whether you would expect a positive or negative relationship between electricity costs and each of the independent variables and explain why. b) (2) Use EXCEL, to create a correlation using all independent variables provided. Discuss all aspects of the correlation matrix, as it compares to your expectations in part (a) and what it tells you about the potential results of a regression analysis. c) (3) Use EXCEL to run a multiple regression to estimate electricity costs using all independent variables provided. Test if the overall regression is significant at the 5% level. d) (3) Test for the significance of each slope coefficient at the 5% level. (Incorporate all tests into ONE five step answer as in the slides.) e) (2) Based on your correlation matrix and the results of your regression do you suspect multicollinearity? If yes, which independent variable(s) to you think could be responsible? f) (2) Explain how you test for multicollinearity but do not test. g) (2) How could you rerun the regression taking into account the issues in the first regression? Which variable or variables would you keep as an explanatory variable(s), and why? Electricit Natural Gasolin Fuel y Gas e Oil 2.54 1.29 0.39 0.21 3.51 1.71 0.57 0.31 4.64 2.98 0.86 0.44 5.36 3.68 1.19 0.61 6.2 4.29 1.31 0.76 6.86 5.17 1.22 0.68 7.18 6.06 1.16 0.65 7.54 6.12 1.13 0.69 7.79 6.12 1.12 0.61 7.41 5.83 0.86 0.34 7.41 5.54 0.9 0.42 7.49 4.49 0.9 0.33 7 1. (15+2 for Excel work) Consider the following data for the average cost of various fuels and electricity for a twelve year period. The data measure the following: Electricity Natural Gas Fuel Oil Gasoline Residential rate per kilowatt hour Residential natural gas per 1000 cubic feet Residential fuel oil per gallon Regular gasoline per gallon Your model will attempt to predict residential electricity costs using the cost of the other fuels. a) (1) State whether you would expect a positive or negative relationship between electricity costs and each of the independent variables and explain why. ALL are expected to have POSITIVE- Electricity cost is increasing as Natural gas Gasoline and Fuel oil increases As well as is as Natural gas Gasoline and increases as the cost of electricity increases. This shows there is positive correlation between electricity cost and other given three variables. b) (2) Use EXCEL, to create a correlation using all independent variables provided. Discuss all aspects of the correlation matrix, as it compares to your expectations in part (a) and what it tells you about the potential results of a regression analysis. The correlation matrix is as follows (Note: we are only interested in the first column) Electricity cost Electricity Natural Gas Gasoline Fuel Oil 1 Natural Gas 0.96411 1 Gasoline 0.687611 0.70089 1 Fuel Oil 0.497928 0.570268 0.933853 1 Discussion: The correlation is as expected in part (a) above. The correlation between electricity cost and Natural gas is positive with r=0.96411. This is a very high positive correlation. Electricity cost & gasoline and electricity cost & fuel oil also has a positive good correlation of o.687611 and r=0.49728 respectively. 1 c) (3) Use EXCEL to run a multiple regression to estimate electricity costs using all independent variables provided. SUMMARY OUTPUT Regression Statistics Multiple R 0.982909 R Square 0.966111 Adjusted R Square 0.953402 Standard Error 0.379873 Observations 12 ANOVA df SS Regression MS 3 32.91006 10.97002 8 1.154428 34.06449 Standard Error Significance F 0.144303 11 Residual Total F Coefficients 76.02049 t Stat P-value 3.2E-06 Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 1.198407 0.447384 2.678702 0.027982 0.166739 2.230075 0.166739 2.230075 Natural Gas 0.880628 0.099662 8.836167 2.12E-05 0.650808 1.110449 0.650808 1.110449 Gasoline 3.921086 1.413128 2.774757 0.024119 0.662407 7.179765 0.662407 7.179765 Fuel Oil -5.43712 1.856728 -2.92833 0.019047 -9.71874 -1.1555 -9.71874 -1.1555 Y=1.198407+0.880628X1+3.921086X2--5.43712x3 Test if the overall regression is significant at the 5% level. H0: overall regression model is not significant Ha: overall regression model is significant =0.05 F=MSE/MSR=76.02049 P(F=76.02049)=0.0000032 Since P-value <0.05 reject the H0 and conclude that overall regression model is significant at 5% level OR One-tailed. F,(k, n-k-1) = F0.05,(3,8) = 4.46. Since F0 > F (76.02049> 4.46), reject Ho. The regression equation is significant (3) Test for the significance of each slope coefficient at the 5% level. (Incorporate all tests into ONE five step answer as in the slides.) Coefficients Natural Gas Standard Error 0.880628 0.099662 Null hypothesis B1=0 Ha t Stat B10 8.836167 P-value 2.12E-05 decision Significant 2 Gasoline 3.921086 1.413128 Fuel Oil -5.43712 1.856728 B2=0 B20 B3=0 B20 2.774757 0.024119 -2.92833 0.019047 significant Significant d) (2) Based on your correlation matrix and the results of your regression do you suspect multicollinearity? If yes, which independent variable(s) to you think could be responsible? Yes: Fuel: Since the Regression coefficient is negative and correlation is positive. The two contradicts each other which shows there might be multicollinearity e) (2) Explain how you test for multicollinearity but do not test. The analysis exhibits the signs of multicollinearity such as, estimates of the coefficients vary from model to model. The t-tests for each of the individual slopes are non-significant (P > 0.05), but the overall F-test for testing all of the slopes are simultaneously 0 is significant (P < 0.05). The correlations among pairs of predictor variables are large. Looking at correlations only among pairs of predictors, however, is limiting. It is possible that the pairwise correlations are small, and yet a linear dependence exists among three or even more variables, for example, if X3 = 2X1 + 5X2 + error, say. That's why many regression analysts often rely on what are called variance inflation factors (VIF) to help detect multicollinearity. f) (2) How could you rerun the regression taking into account the issues in the first regression? Which variable or variables would you keep as an explanatory variable(s), and why? Natural gas and gasoline: They have almost perfect correlation with Electricity cost wich means that they are perfect to be used as explanatory variables. 3

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Homework Clubs Preparing For Algebra Math Help For Struggling Kids

Authors: Susan Everingham

1st Edition

1723708585, 978-1723708589

More Books

Students also viewed these Mathematics questions