PLEASE PROVIDE FORMULAS AND STEP-BY-STEP. THANK YOU.
Assignment 4 1. Suppose you run a multivariate regression and one of the coefficients is not statistically significant at any of the usual significance levels (i.e. it has a large p-value). Is it okay to remove the corresponding variable from your analysis? a. Yes! Control variables don't matter, and omitted variable bias is a myth. b. Yes! I read that you could do this on a website somewhere, so it's fine. c. Yes! If it's not statistically significant, then it has no place in your regression. d. No! While this will likely (but not certainly) decrease the standard errors of your remaining coefficient estimates, it does so at the risk of biasing these estimates. ic the following information to answer questions 2 - 4: Suppose you wish to run a multivariate gression to determine the relationship between household income (Income,) and home-ownership wasHome, = 1 if home owner, = 0 otherwise), controlling for "Region" of the country egion, takes values 1, 2,3, or 4, and Region, = 1 if West, - 2 if Northwest, = 3 if Northeast, or = 4 South). 2. Suppose you use your "Region" variable to create four new variables: West, = 1 if lives in the west, NW, = 1 if lives in the northwest, NE = 1 if lives in the Northeast, and South, = 1 if lives in the South. What did you just do? You used a string variable to create four numeric variables. b. You used a categorical variable to create four indicator variables. c. You used a continuous variable to create a semi-continuous variable. d. You used a binary variable to create four continuous variables. 3. Which of the following regression specifications is most appropriate for answering your question given your data? OwnsHome, = Bo + 8, Income, + B, Region, + u b. OwnsHome, = Bo + #, Income, + #,West + ByNW, + BANE + BySouth, + up c. OwnsHome, = Bo + BiIncome, + $2West + BaNW, + BANE + us d. Income, = Bo + 1OwnsHome; + B2 Region, + u; Suppose you ran the following regression: OwnsHome, = Po + 8, West + B2NW + ByNE + u; What will the OLS estimate of the fo coefficient ( Bo) tell you? The percent of people in the data who are estimated to own a home if they do not live in any of the four regions. b. The percent of people in the South who own a home. c. The difference between the percent of people in the South who own a home and the percent of people not in the South who own a home. d. Nothing, it's just a meaningless intercept term.5. Suppose you ran the following regression: OwnsHome, = Bo + B. West + 2NW + BINE + us What will the OLS estimate of the , coefficient ( B,) tell you? a. The percent of people in the West who are estimated to own a home. b. How much more or less likely people in the West are to own a home, relative to the other three regions. C. How much more or less likely people in the West are to own a home, relative to people in the South. d. Nothing, it's just a meaningless slope term. Use the following information to answer questions 6 - 11: Consider the output in the following regression table Table 1 - Effect of SAT Prep Course on SAT Math Score* (Dependent Variable: SAT Math score - out of 800 total points) (Standard errors in parentheses) Independent Variable OLS Coefficient Estimate OLS Coefficient Estimate (Model 1) (Model 2) Self-reported "Study Time" 1.20 0.84 (Hours) 0.48) (0.35) Took Prep Course 68.40 52.12 (-l if yes, 0 if no) (16.32) (16.08) Took Prep Course x Study Time 1.12 (interaction of the above two variables) (0.47) Intercept 428.65 465.97 (28.88) (29.32) N 1,200 1,200 Adjusted R' 0.22 0.31 Note: This analysis is (a) fake, and (b) even if it were based on real data, would suffer from endogencity (reverse causality) bias (and therefore the coefficient estimates shouldn't be trusted. Why the endogencity bias? Because students who need more help taking the SAT may be more likely to take an SAT prep course. So if we really want to answer the questions "What is the effect of the prep course on SAT score?" the way to do so would be to either (1) randomly assign a mandatory SAT prep course to students who are all going to take the SATs (this will tell you the effect of the course on SAT scores), or (2) randomly admit only some of the students who wanted to take the prep course to the SAT course and don't let the other ones take it (this would tell you the effect of the course on scores for students who actually wanted to take the course - this is different from item (I) here). though this suffers from the fact that trying to get in and not being allowed entry might make those students work harder, so that's not so good. With more data you could perform other analyses as well which we won't discuss here, but this discussion is important as it highlights the importance of knowing your data and thinking about your analysis, because - even if it looks clean (like it does here) - the results might not be trustworthy if you did not perform the proper analysis for your data. Regardless, answer the questions below as though the coefficients are unbiased.6. Which of the variables in model 2 are statistically significant with 95% confidence? (Select ALL that apply to receive full credit) a. Study Time b. Took prep course c. The interaction term d. The intercept term 7. Which of the following best describes the difference between Model I and Model 2? a. The coefficient estimates in Model I are biased because they do not control for omitted factors that Model 2 does address. b. The effect of study time on SAT score for students who took a prep course is estimated to be greater by Model I than by Model 2. c. Model I forces the effect of study time on SAT score to be the same regardless of whether or not the student took a prep course. d. Model 2 estimates that students who didn't take a prep course and didn't study will do worse than what Model 1 estimates, 8. Use the output from Model 1 to answer this question: Taking the SAT prep course has the equivalent effect on score as how many hours of study time? 9. Use the output from Model 2 to answer this question: For a student who took the SAT prep course, one more hour of study time is estimated to cause an X unit change in SAT score. What is the value of X? 10. Use the output from Model 2 to answer this question: What is the estimated SAT score of a student who took the prep course and studied for 60 hours? Use the following information to answer questions 11-12: Suppose you have data on housing prices for 28,000 randomly selected homes sold throughout the central coast in the past year (( = 1, .., 28000) and you estimate: In (Price,) = 0.29 + 1.86 . In(SqFt;) + 0.122 . BlueRibbonSchool Where In (Price ) is the natural log of the home's sale price, In (SqFt,) is the natural log of the home's square footage, and BlueRibbonSchool, is an indicator variable = I if the home is zones for a Blue- Ribbon school (a good thing), and = 0 otherwise. 1 1. A 1% increase in square footage is estimated to cause what % increase in sale price? (If your answer is 2.36%, write 2.36 for your answer, do not write "2.36%" or "0.0236")12. What % increase in sale price is associated with the home being zoned for a Blue-Ribbon school? (If your answer is 2.36%, write 2.36 for your answer, do not write "2.36%" or "0.0236") 13. What is the estimated sales price for a 1 600 square foot home zoned for a Blue Ribbon school? 14. What is the estimated sales price for a 1600 square foot home not zoned for a Blue Ribbon school? Use the "Home Prices.xlax" dataset to answer questions 15-18 15. Run a multivariate regression with Sale Price as your dependent variable and Square Footage, School Score, and Recently Renovated as your independent variables. By how much is the sale price of a home estimated to increase as a result of a I point increase in school score? 16. Now, run a multivariate regression with the natural log of Sale Price as your dependent variable, and the natural log of Square Footage, School Score, and Recently Renovated as your independent variables. What percent change in sale price is associated with a I point increase in school score? (If your answer is 23%, enter 23, do not enter "23%" or "0.23") 17. Using your model with Sale Price as your dependent variable and Square Footage, School Score, and Recently Renovated as your independent variables (so, from #15, NOT #16), what is the estimated sales price of the house with ID = 1? 18. What percent of homes in this dataset sold for less than their estimated sale price (using the model from #15)? (If your answer is 23%, enter 23, do not enter "23" or "0.23")