Three Examples Where does one find something that affects incarceration but has no direct We now turn to three empirical applications of IV regression that provide exam- effect on the crime rate? One place is exogenous variation in the capacity of exist- ples of how different researchers used their expert knowledge of their empirical problem to find instrumental variables. ing prisons. Because it takes time to build a prison, short-term capacity restrictions can force states to release prisoners prematurely or otherwise reduce incarceration Does putting criminals in jail reduce crime? This is a question only an econ- rates. Using this reasoning, Levitt (1996) suggested that lawsuits aimed at reducing omist would ask. After all, a criminal cannot commit a crime outside jail while in prison overcrowding could serve as an instrumental variable, and he implemented prison, and the fact that some criminals are caught and jailed serves to deter oth- this idea using panel data for the U.S. states from 1972 to 1993. ers. But the magnitude of the combined effect-the change in the crime rate asso- Are variables measuring overcrowding litigation valid instruments? Although ciated with a 1% increase in the prison population-is an empirical question. Levitt did not report first-stage F-statistics, the prison overcrowding litigation slowed the growth of prisoner incarcerations in his data, suggesting that this One strategy for estimating this effect is to regress crime rates (crimes per 100,000 members of the general population) against incarceration rates (prison- instrument is relevant. To the extent that overcrowding litigation is induced by ers per 100,000), using annual data at a suitable level of jurisdiction (for example, prison conditions but not by the crime rate or its determinants, this instrument U.S. states). This regression could include some control variables measuring eco- is exogenous. Because Levitt breaks down overcrowding legislation into several nomic conditions (crime increases when general economic conditions worsen), types, and thus has several instruments, he is able to test the overidentifying demographics (youths commit more crimes than the elderly), and so forth. There restrictions and fails to reject them using the J-test, which bolsters the case that is, however, a serious potential for simultaneous causality bias that undermines his instruments are valid. such an analysis: if the crime rate goes up and the police do their job, there will Jsing these instruments and TSLS, Levitt estimated the effect on the crime be more prisoners. On the one hand, increased incarceration reduces the crime rate of incarceration to be substantial. This estimated effect was three times larger rate; on the other hand, an increased crime rate increases incarceration. As in the than the effect estimated using OLS, suggesting that OLS suffered from large butter example in Figure 10.1, because of this simultaneous causality an OLS simultaneous causality bias. regression of the crime rate on the incarceration rate will estimate some compli- cated combination of these two effects. This problem cannot be solved by find- Does cutting class sizes increase test scores? As we saw in the empirical ing better control variables. analysis of Part II, schools with small classes tend to be wealthier, and their stu- This simultaneous causality bias, however, can be eliminated by finding a suit- dents have access to enhanced learning opportunities both in and out of the class- able instrumental variable and using TSLS. The instrument must be correlated room. In Part II, we used multiple regression to tackle the threat of omitted with the incarceration rate (it must be relevant), but it must also be uncorrelated variables bias by controlling for various measures of student affluente, ability to with the error term in the crime rate equation of interest (it must be exogenous)- speak English, and so forth. Still, a skeptic could wonder whether we did enough: That is, it must affect the incarceration rate but be unrelated to any of the unob- if we left out something important, our estimates of the class size effect would served factors that determine the crime rate. still be biased This potential omitted variables bias could be addressed by including the right control variables, but if these data are unavailable (some, like outside learning opportunities, are hard to measure) then an alternative approach is to use IV regression. This regression requires an instrumental variable correlated with class size (relevance) but uncorrelated with the omitted determinants of test perfor- mance that make up the error term, such as parental interest in learning, learn- ing opportunities outside the classroom, quality of the teachers and school facilities, etc. (exogencity). Where does one look for an instrument that induces random, exogenous vari- ation in class size, but is unrelated to the other determinants of test performance?362 CHAPTER 10 Instrumental Variables Regression Hoxby (2010) suggested biology. Because of random fluctuations in timings of births, the size of the incoming kindergarten class varies from one year to the next. Although the actual number of children entering kindergarten might be endoge- nous (recent news about the school might influence whether parents send a child to a private school), she argued that the potential number of children entering kindergarten-the number of four-year-olds in the district-is mainly a matter of random fluctuations in the birth dates of children. Is potential enrollment a valid instrument? Whether it is exogenous depends 10.5 Where Do Valid Instruments Come From? 363 on whether it is correlated with unobserved determinants of class size. Surely bio- logical fluctuations in potential enrollment are exogenous, but potential enroll- ment also fluctuates because parents with young children choose to move into an A natural starting point for estimating the real-world effect of cardiac improving school district and out of one in trouble. If so, an increase in potential catheterization is to compare patients who received the treatment to those who enrollment could be correlated with unobserved factors such as the quality of did not. This leads to regressing the length of survival of the patient against the school management, rendering this instrument invalid. Hoxby addressed this prob- binary treatment variable (whether the patient received cardiac catheterization) lem by reasoning that growth or decline in the potential student pool for this rea- and other control variables that affect mortality (age, weight, other measured son would occur smoothly over several years, whereas random fluctuations in birth health conditions, etc.). The population coefficient on the indicator variable is dates would produce short-term "spikes" in potential enrollment. Thus, she used the increment to the patient's life expectancy provided by the treatment. Unfor- as her instrument not potential enrollment, but the deviation of potential enroll- tunately, the OLS estimator is subject to bias: cardiac catheterization does not ment from its long-term trend. These deviations satisfy the criterion for instru- "just happen" to a patient randomly; rather, it is performed because the doctor ment relevance (the first-stage F-statistics all exceed 100). She makes a good case and patient decide that it might be effective. If their decision is based in part on that this instrument is exogenous, but, as in all IV analysis, the credibility of this unobserved factors relevant to health outcomes not in the data set, then the treat- assumption is ultimately a matter of judgment. ment decision will be correlated with the regression error term. If the healthiest patients are the ones who receive the treatment, the OLS estimator will be biased Hoxby implemented this strategy using detailed panel data on elementary schools in Connecticut in the 1980s and 1990s. The panel data set permitted her (treatment is correlated with an omitted variable), and the treatment will appear to include school fixed effects, which, in addition to the instrumental variables more effective than it really is. strategy, attacks the problem of omitted variables bias at the school level. Her TSLS This potential bias can be eliminated by IV regression using a valid instru- mental variable. The instrument must be correlated with treatment (must be rel- estimates suggested that the effect on test scores of class size is small; most of her evant) but must be uncorrelated with the omitted health factors that affect survival estimates were statistically insignificantly different from zero. (must be exogenous). Where does one look for something that affects treatment but not the health Does aggressive treatment of heart attacks prolong lives? New aggres- outcome, other than through its effect on treatment? McClellan, McNeil, and sive treatments for victims of heart attacks (technically, acute myocardial infarc tions, or AMI) hold the potential for saving lives. Before a new medical Newhouse (1994) suggested geography. Most hospitals in their data set did not specialize in cardiac catheterization, so many patients were closer to "regular" hos- procedure-in this example, cardiac catheterization"-is approved for general use, it goes through clinical trials, a series of randomized controlled experiments pitals that did not offer this treatment than to cardiac catheterization hospitals McClellan, McNeil, and Newhouse therefore used as an instrumental variable the designed to measure its effects and side effects. But strong performance in a clin- ical trial is one thing: actual performance in the real world is another. difference between the distance from the AMI patient's home to the nearest car- diac catheterization hospital and the distance to the nearest hospital of any sort; this distance is zero if the nearest hospital is a cardiac catheterization hospital, oth- "Cardiac catheterization is a procedure in which a catheter, or tube, is inserted into a blood vessel erwise it is positive. If this relative distance affects the probability of receiving this and guided all the way to the heart to obtain information about the heart and coronary arteries. treatment, then it is relevant. If it is distributed randomly across AMI victims, then it is exogenous. Is relative distance to the nearest cardiac catheterization hospital a valid instru- ment? McClellan, McNeil, and Newhouse do not report first-stage F-statistics, but they do provide other empirical evidence that it is not weak. Is this distance measure exogenous? They make two arguments. First, they draw on their med- ical expertise and knowledge of the health care system to argue that distance to a hospital is plausibly uncorrelated with any of the unobservable variables that deter- mine AMI outcomes. Second, they have data on some of the additional variables that affect AMI outcomes, such as the weight of the patient, and in their sample364 CHAPTER 10 Instrumental Variables Regression distance is uncorrelated with these observable determinants of survival; this, they argue, makes it more credible that distance is uncorrelated with the wobservable determinants in the error term as well. Using 205,021 observations on Americans aged at least 64 who had an AMI in 1987, McClellan, McNeil, and Newhouse reached a striking conclusion: their TSLS estimates suggest that cardiac catheterization has a small, possibly zero effect on health outcomes, that is, cardiac catheterization does not substantially prolong life. In contrast, the OLS estimates suggest a large positive effect. They interpret Summary 365 this difference as evidence of bias in the OLS estimates. McClellan, McNeil, and Newhouse's IV method has an interesting interpre- Successful IV regression requires valid instruments, that is, instruments that tation. The OLS analysis used actual treatment as the regressor, but because actual are both relevant (not weak) and exogenous. If the instruments are weak, then the treatment is itself the outcome of a decision by patient and doctor, they argue that ISLS estimator can be biased, even in large samples, and statistical inferences based the actual treatment is correlated with the error term. Instead, TSLS uses predicted on TSLS t-statistics and confidence intervals can be misleading. Fortunately, when treatment, where the variation in predicted treatment arises because of variation there is a single endogenous regressor it is possible to check for weak instruments in the instrumental variable: patients closer to a cardiac catheterization hospital simply by checking the first-stage F-statistic. are more likely to receive this treatment. If the instruments are not exogenous, that is, if one or more instruments is This interpretation has two implications. First, the IV regression actually esti- correlated with the error term, then the TSLS estimator is inconsistent. If there mates the effect of the treatment not on a "typical" randomly selected patient, but are more instruments than endogenous regressors, then instrument exogeneity rather on patients for whom distance is an important consideration in the treat- can be examined by testing the overidentifying restrictions. However, the core ment decision. The effect on those patients might differ from the effect on a typ- assumption-that there are at least as many exogenous instruments as there are ical patient, which provides one explanation of the greater estimated effectiveness endogenous regressors-cannot be tested. It is therefore incumbent on both the of the treatment in clinical trials than in McClellan, McNeil, and Newhouse's IV empirical analyst and the critical reader to use their own understanding of the study. Second, it suggests a general strategy for finding instruments in this type of empirical application to evaluate whether this assumption is reasonable. setting: find an instrument that affects the probability of treatment, but does so The interpretation of IV regression as a way to exploit known exogenous for reasons that are unrelated to the outcome except through their effect on the variation in the endogenous regressor can be used to guide the search for poten- likelihood of treatment. Both these implications have applicability to experiment tial instrumental variables in a particular application. This interpretation under-. tal and "quasi-experimental" studies, the topic of Chapter 11. lies much of the empirical analysis in the area that goes under the broad heading of program evaluation, in which experiments or quasi-experiments are used to estimate the effect of programs, policies, or other interventions on some outcome measure. A variety of additional issues arises in those applications, for example the 10.6 Conclusion interpretation of IV results when, as in the cardiac catheterization example, dif- ferent "patients" might have different responses to the same "treatment." These From the humble start of estimating how much less butter people will buy if its and other aspects of empirical program evaluation are taken up in Chapter 11. price rises, IV methods have evolved into a general approach for estimating regres- sions when one or more variables are correlated with the error term. Instrumen- tal variables regression uses the instruments to isolate variation in the endogenous regressors that is uncorrelated with the error in the regression of interest; this is the first stage of two stage least squares. This in turn permits estimation of the Summary effect of interest in the second stage of two stage least squares. 1. Instrumental variables regression is a way to estimate regression coefficients when one or more regressor is correlated with the error term. 2. Endogenous variables are correlated with the error term in the equation of inter- est; exogenous variables are uncorrelated with this error term. 3. For an instrument to be valid, it must (1) be correlated with the included endoge- nous variable and (2) be exogenous 4. IV regression requires at least as many instruments as included endogenous variables. 5. The TSLS estimator has two stages: first, the included endogenous variables are regressed against the included exogenous variables and the instruments; second, the