Question: Logarithms and polynomials Interaction effects Dummy variables ECMT1020: Introduction to Econometrics Lecture 10: Multivar. regr. & transformations Peter Exterkate peter.exterkate@sydney.edu.au Merewether Building, Office 343 University

Logarithms and polynomials Interaction effects Dummy variables ECMT1020: Introduction to Econometrics Lecture 10: Multivar. regr. & transformations Peter Exterkate peter.exterkate@sydney.edu.au Merewether Building, Office 343 University of Sydney 27 May 2016 ECMT1020: Introduction to Econometrics Lecture 10 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Introduction I In the previous three lectures, we saw pretty much all there is to know about multivariate linear least squares regression. Today's lecture will have a more applied focus: how can we use data transformations to our advantage in this model? I As stated before, logarithms are among the most useful transformations in econometrics. They turn out to work in the same way here as they do in bivariate models, so we won't need to spend much time on them today I More interesting are polynomial models, and models with interaction effects in them I However, some of the greatest applications of multivariate regression techniques involve dummy variables. You will notice that I am covering Chapter 16 in reverse order, just so I can devote an entire hour to this topic ECMT1020: Introduction to Econometrics Lecture 10 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Logarithms Quadratic models Higher-order polynomials Refresher about regression with logarithms I Recall how we introduced logarithms in bivariate regression models. They were useful because they allowed us to model relative changes I For example, in the bivariate model ln yi = 1 + 2 x2i + ui , the estimated impact of a one-unit change in x2 is a 1002 % change in y I A similar interpretation works in the multivariate model. We just need to keep the \"partial effect\" / \"ceteris paribus\" / \"all else being equal\" interpretation in mind I Thus, in ln yi = 1 + 2 x2i + 3 x3i + ui , the estimated impact of a one-unit change in x2 while x3 does not change is a 1002 % change in y ECMT1020: Introduction to Econometrics Lecture 10 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Logarithms Quadratic models Higher-order polynomials Applications of logarithms I If we care about forecasting and our model is specified in terms of ln y , we also \u0010 need to take \u0011 retransformation bias into account; d y = exp ln y + se2 /2 as before I A common application of log-log models is still to estimate elasticities, as it was in bivariate models: if I decrease my price by 1% and my competitor does not change his price, how much more (in % terms) am I going to sell? I Logarithms also commonly arise from multiplicative models, which are quite common in economics. A leading example is the Cobb-Douglas model, where a firm's production yi is related to capital ki and labour `i by yi ki2 `i 3 I Taking logarithms, ln yi ln + 2 ln ki + 3 ln `i , so we can estimate the parameters of interest using the regression model ln yi = 1 + 2 ln ki + 3 ln `i + ui ECMT1020: Introduction to Econometrics Lecture 10 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Logarithms Quadratic models Higher-order polynomials Quadratic models I The most basic quadratic model is y = 1 + 2 x + 3 x 2 + u I The regression \"line\" looks like a parabola in this case. In particular, there is a value of x where y turns from being a decreasing function of x to being an increasing function, or vice versa; this happens at x = 2 / 23 . Some possible shapes are drawn on the next slide I The good news is that we can still use linear regression techniques to estimate such a model! Just generate a new variable z = x 2 and estimate y = 1 + 2 x + 3 z + u I What matters is that the thing we're squaring is data, not parameters. Terminology: we can handle models that are linear in their parameters ECMT1020: Introduction to Econometrics Lecture 10 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Logarithms Quadratic models Higher-order polynomials Plots of y = 1 + 2 x + 3 x 2 10 10 10 - 1 = 10 8 -1 = 8 8 - 2 = -3 - 3 = 0.25 6 -1 = 1 8 - 2 = -1.2 - 3 = 0.05 6 4 4 4 2 2 2 0 0 0 2 4 6 8 10 2 4 6 8 10 0 10 10 8 8 8 6 -1 = 2 4 -2 = 3 - 3 = -0.25 2 - 3 = -0.05 2 4 6 8 10 6 8 10 6 8 10 - 2 = -0.2 - 3 = -0.05 2 0 0 4 -1 = 9 4 - 2 = 1.2 2 0 2 6 -1 = 0 4 - 3 = 0.05 0 0 10 6 - 2 = 0.2 6 0 0 2 ECMT1020: Introduction to Econometrics 4 6 8 10 Lecture 10 0 2 4 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Logarithms Quadratic models Higher-order polynomials Applications of quadratic models I Such a model makes sense if we expect y to first increase, then decrease (or the other way around) as a function of x I This could happen with earnings as a function of age. First, your wages go up because you gain experience; later in life, you may choose to work fewer hours and earn less as a result I Another common application is the Laffer curve, where tax revenues are modelled as a function of the tax rate I Of course, it could be the case that the top x = 2 / 23 lies far outside the range of values you observe for x. In that case, you'll always see y increase (or decrease) as a function of x, but with a non-constant slope ECMT1020: Introduction to Econometrics Lecture 10 Peter Exterkate Logarithms and polynomials Interaction effects Dummy variables Logarithms Quadratic models Higher-order polynomials Marginal effects I Recall that 3 is a partial effect. Thus, it would be interpreted as \"the expected change in y , if x 2 changes but x doesn't\Outliers Multicollinearity Failure of the model assumptions ECMT1020: Introduction to Econometrics Lecture 11: Diagnostics and misspecification Peter Exterkate peter.exterkate@sydney.edu.au Merewether Building, Office 343 University of Sydney 3 June 2016 ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Introduction I This semester, we investigated the linear regression model, and found a lot of situations in which it could be applied, as well as tests for many different hypotheses I All of those results depend on certain assumptions on our data and on the model. Today, we look at diagnostics to see whether we can trust those assumptions I First, some things can go wrong with the data. Outliers have a very heavy impact on our estimates, and multicollinearity is a problem even if the collinearity is not perfect I Finally, we discuss how to see whether our model assumptions appear to be violated, and what to do if they are ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers Outliers in bivariate models I Recall from way back in univariate statistics that an outlier is an observation that is much larger or much smaller than the others I With more than one variable, the situation is more complicated. A person making $200,000 a year is an outlier. . . unless you condition on this person being a heart surgeon n X I In bivariate regression we have b2 = (xi x) (yi y) i=1 n X . One can 2 (xi x) i=1 see that an observation with xi far from x has a serious impact. If you find such an observation, first check that it is correct - there might be a typo; fix it if possible, otherwise drop that observation. If the data is actually correct, we'll have to live with it. Probably good to report results both with and without that observation ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers Bivariate data with outliers y Good outlier Bad outlier x ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers Detecting outliers I In the bivariate case, we can still plot the data and visually find outliers. Just plot y versus x; outliers are points that are far away from all of the others I Other good plots to make are e versus x and e versus y. There should be no patterns in this plot if all assumptions are satisfied, so if there are patterns, this might indicate issues like omitted regressors, or heteroskedasticity - more on that toward the end of this lecture I If you have time series data, plotting et versus t also makes sense, because it usually allows you to spot both heteroskedasticity and autocorrelation at a glance ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers Outliers in multivariate models I Things are more complicated in the multivariate case. An observation can be an outlier without any of its values being special in isolation, if they just form a strange combination I A person with sixteen years of education? Sure, plenty of those exist. A fourteen-year-old person? Nothing special about that. But a fourteen-year-old with sixteen years of schooling. . . I Common practice is to plot e versus each regressor xj separately, if there aren't too many of them. Otherwise, plotting e versus y can also often reveal outlying observations ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers residuals Heteroskedasticity fitted values ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers residuals Nonlinearity fitted values ECMT1020: Introduction to Econometrics Lecture 11 Peter Exterkate Outliers Multicollinearity Failure of the model assumptions Outliers in bivariate models Outliers in multivariate models Assessing the effects of outliers Frisch and Waugh to the rescue I Another way to spot outliers in multivariate regression is obtained by thinking about partial effects I Plot \"the part of y that is not explained by x3 , x4 , . . .\" versus \"the part of x2 that is not explained by x3 , x4 , . . .\1. In the regression model ln ( demand )= 1+ 2 ln ( ownprice ) + 3 ln ( competitorsprice ) +u how would you interpret the coefficient 2 A 1% increase in my own price will lead to a 2 unit drop in demand, taking into account my competitor's reaction to my price change. A 1% increase in my own price will lead to a 2 unit drop in demand, assuming my competitor does not change her price. A 1% increase in my own price will lead to a 2 % drop in demand, taking into account my competitor's reaction to my price change. A 1% increase in my own price will lead to a 2 % drop in demand, assuming my competitor does not change her price. 2. Consider the regression model 2 y i= 1 + 2 x i+ 3 x i +u i . In which case does this model describe an inverted Ushaped relationship between X and Y? If 3< 0 2 is. what Only if and If , no matter 3< 0 2> 0 3> 0 , no matter what 2 is. Only if 2< 0 3. 3> 0 and . Consider the Laffer curve revenue=40 rate0.4 rate2 +u , expressing how total tax revenues (in trillions of dollars) vary with the average income tax rate (in percent, so rate = 30 would correspond to a 30% tax rate). If taxes are currently at 30%, what is the marginal effect on revenues of a small change in the tax rate? a. revenue 40 rate b. revenue 16 rate c. revenue 16 rate d. revenue 40 rate 4. Suppose we are trying to model y as a polynomial function of x. Which of the following is NOT a valid reason to pick a fourthorder polynomial over a thirdorder polynomial? Adding the fourth-order term improved the BIC. We actually believe that y behaves according to such a model. The coefficient on the fourth-order term is significant. The R is higher in the fourth-order model. 5. In the regression model y i= 1 + 2 x i+ 3 z i+ 4 x i zi +ui , what does the term " 4 xi z i " capture? Assume all coefficients are positive. X and Z both have a positive influence on Y. If Z is higher, the expected value of Y is higher, regardless of the value of X. If Z is higher, the impact of an increase in X on the expected value of Y is greater. If Z is higher, X also tends to be higher. 6. In the regression model y i= 1 + 2 x i+ 3 z i+ 4 x i zi +ui , how do we assess the significance of zi? Using a t test for H 0 : 3=0 . Using two t tests, first for also for H 0 : 4 =0 . Using two t tests, first for also for H 0 : 4 =0 , and if that one doesn't reject, H 0 : 3=0 . Using an F test for 7. H 0 : 3=0 , and if that one doesn't reject, H 0 : 3= 4=0 . Suppose that a researcher, using wage data on randomly selected male and female workers, obtains the estimated regression model wagei = 12.52 + 2.12malei + ei, where wage is measured in dollars per hour and male is a dummy variable that is equal to 1 if the person is male and 0 if the person is female. Another researcher uses the same data, but regresses wage on female, a dummy variable that is equal to 1 if the person is female and 0 if the person is male. What are the regression estimates obtained from this regression? wagei = 10.40 - 2.12femalei + ei. wagei = 12.52 - 2.12femalei + ei. wagei = 14.64 - 2.12femalei + ei. We don't have enough information to answer this question. 8. Finish the following sentence. When using dummy variables to deal with a categorical variable that takes k values, one needs to fit a model with a constant term and... ... one dummy variable that takes k categorical values. ... k - 1 dummy variables that take the values zero and one. ... k dummy variables that take the values zero and one. ... k + 1 dummy variables that take the values zero and one. 9. Let d be a dummy variable, and x a regular regressor. We assume that for individuals with d = 0, and y i= 1 + 2 x + u y=1 +2 x +u for individuals with d = 1. We are interested in testing whether both regression lines are equal, so we run the Chow test regression y= 1 + 1 d + 2 x + 2 dx+ u . Which hypothesis do we need to test in this last regression? a. 2 =0 b. 2= 2 =0 c. 1= 2 =0 d. 1= 2= 2 =0 10. In the Chow test regression model if y= 1 + 1 d + 2 x + 2 dx+ u , what would it mean 2 =0 The marginal effect of x on y is equal in both groups. For individuals with d = 1, x has no effect on y. If x = 0, both groups have the same expected value for y. The average values of x are equal in both groups. 11. Which of the following are measures of how heavily an outlier affects the estimated regression line? DFITS. DFBETA. Both of the above. None of the above. 12. Finish the following sentence. Multicollinearity occurs when two or more explanatory variables are highly correlated with... ... each other. ... their own lags. ... the error term. ... the dependent variable. 13. Which of the following situations causes the least squares estimators to be biased in general? Omitted variables. Redundant variables. Both of the above. None of the above. 14. Suppose the RESET test rejects its null hypothesis. What should you do? Continue working with the linear model. Work with the unrestricted model that was estimated for the RESET test. Add the squares and cubes of all explanatory variables to the model. Search for appropriate nonlinear functions to be added as regressors. 15. If we wish to test whether Cov[u,x] = 0, what is wrong with using Cov[e,x] for that purpose? Nothing, a simple t test should work fine. Cov[e,x] is subject to too much sampling error to be practically useful in small samples. In large enough samples, there is no problem. Cov[e,x] is not informative about Cov[u,x], even in very large samples. Sample selection bias tends to make Cov[e,x] > 0 even if Cov[u,x] = 0 in population. 16. What is the effect of endogeneity on the least squares estimators? They are still unbiased, consistent, and BLUE (best liner unbiased estimate); we just need to adjust the degrees of freedom. They are still unbiased and consistent, but not BLUE. They are still unbiased, but not consistent. They become biased, and lose their consistency as well. 17. Which of the following problems would be visible on a residual plot of et versus t? Heteroskedasti city. Autocorrelation . Both of the above. None of the above. 18. Which of the following statements is NOT true? In the presence of heteroskedasticity, we can still use our usual t and F tests, as long as we adjust the degrees of freedom. Multicollinearity among explanatory variables makes it harder to reject null hypotheses that the true parameters in a regression are zero. If our sample is large enough, we can be confident that the t and F statistics have approximately the correct distributions, even if the disturbance terms are not normally distributed. In a regression model with 25 observations and three explanatory variables other than the intercept term, we have 21 degrees of freedom. 19. How does the Central Limit Theorem help us in regression models with non-normal errors? It doesn't. It tells us that the distribution of the disturbance terms will tend to normal as the sample size grows. It tells us that the distribution of t and F statistics is correct even in small samples, despite the non-normality of the disturbance terms. It tells us that the distribution of t and F statistics tends to the correct one as the sample size grows. 20. How would a Q-Q plot look if the disturbance terms have a kurtosis much smaller than three, but no skewness? Points would generally fall above the line in both tails. Points would generally fall below the line in both tails. Points would generally fall above the line in the left tail, and below the line in the right tail. Points would generally fall below the line in the left tail, and above the line in the right tail

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!