Question

1 Approved Answer

Posted on Jun 11, 2024

The purposes of this problem set are to review concepts and methods related to: omitted variables bias randomized control trials, and difference-in-differences estimation 1. Consider

The purposes of this problem set are to review concepts and methods related to:

omitted variables bias
randomized control trials, and
difference-in-differences estimation

1. Consider an agricultural extension program that teaches farmers new cultivation methods. The program's aim is to increase farm household incomes. The program is rolled out by a district government.Because the government has a limited budget, there is funding to enroll only a fraction of the district's farmers in the program. The government staff who are responsible for enrolling farmers into the program are told to focus their limited resources on enrolling the neediest farmers within the district.

We wish to estimate the impact (i.e. causal effect) of this program on farm household incomes. One year after the program was rolled out, we draw a simple random sample of farm households within the district. In the sample, as in the population, some farmers have participated in the program and some have not. We run this regression

reg Y P

where Y is a measure of household income and P is a dummy variable equal to 1 if the farmer participated in the program and 0 if not. This regression omits many variables that help to determine farm household income, including household-level measures of farm size, land quality, and farmer's education.

a.(2 points) In the research just described, are we undertaking a randomized control trial (RCT)? Explain.

b. (1 point) Do you think that variables measuring farm size, land quality, and farmer's education would tend to have positive or negative effects on Y, all else equal? (No explanation required.)

c. (3 points) Given what you know about the process through which farmers were selected into the program, do you expect the participation indicator P to be positively or negative correlated with omitted variables such as farm size, land quality, and farmer's education? Explain briefly. Two to three sentences should suffice.

d. (3 points) Given your answers to the previous two questions, do you expect the above regression (a simple regression of Y on P) to over-estimate, under-estimate, or provide an unbiased estimate of the true causal effect of participation in the program on farm household income? Explain your logic completely. Two to four sentences should suffice.

[RememberTo sign a bias, we must sign the two terms (that are multiplied together in the bias term) and then identify whether the product of those two signs is positive or negative. To determine whether the bias will lead us to overestimate or underestimate an effect, we must also identify whether the true effect we are trying to estimate is positive or negative (because, for example, a negative bias means under-estimating if the true effect is positive and over-estimating if the true effect is negative). This means that a complete answer should:

discuss the sign of the effect of the omitted variables on Y
discuss the sign of the correlation between the omitted variables and the included variable of interest (P)
discuss the sign of the product of the effects discussed in the two previous bullets
discuss the likely sign of the true causal effect of P on Y, and whether a bias with the sign you just identified would imply a tendency to over-state or under-state the true effect.]

2. Consider a different agricultural extension program. Because of government budget constraints, this program will be rolled out in some communities but not others within a district. The program personnel in charge of selecting communities and setting up the program know that their superiors will perform field visits one year after the program is rolled out. They also know that they will have better prospects for promotions and pay raises if, when their superiors visit farms in the communities that participated in the program, they see fields that look very productive. Whether unconsciously or consciously, this leads them to choose to roll the program out in communities with relatively rich soils and good connections to markets, where the fields are likely to look especially productive one year from now (whether the program has any impact or not!).

Again, we wish to estimate the impact (i.e. causal effect) of this program on farm household incomes. One year after the program was rolled out, we draw a simple random sample of farm households within the district. In the sample, as in the population, some farmers are from communities where the program was rolled out and participated in the program, while other farmers are from non-program communities and did not participate. We run this regression

reg Y P

where Y is a measure of household income and P is a dummy variable equal to 1 if the farmer participated in the program and 0 if not.

a. (5 points) Given what you know about the process through which communities and farmers were selected into this program, do you expect the simple regression above to under-state, over-state or provide an unbiased estimate of the program's impact on household incomes?Explain your logic completely (see 1d above). Several sentences should suffice.

Now suppose that we devise a new plan for evaluating the impact of this program on farm income Y. Program staff are still in charge of determining which communities will receive the program and which will not, and still face the same incentives to improve their promotion prospects by placing the program in communities with relatively good soil and market connections. We the researchers, however, after learning which communities will receive the program, but before the program is rolled out, select a random sample of farmers in communities that will receive the program, and a random sample of farmers in communities that will not receive the program and gather data on Y for all farmers in these samples. One year after the program has been rolled out, we will draw a new random sample of farmers from the communities in which the program was rolled out and a new random sample of farmers from the communities in which the program was not rolled out, and again collect data on Y from each farmer. We merge all the data from before and after the program was rolled out into a single dataset that includes the following variables:

Y_i = farm income on farm i

T_i = indicator equal to 1 if the farmer of observation i is in a community that participated in the program (whether the observation comes from before or after the program is rolled out) and 0 if the farmer is in a community that did not receive the program.

F_i = indicator equal to 1 if the farm in observation i is from a follow-up sample (collected after the program was rolled out) and 0 if it is from a baseline sample (collected before the program was rolled out).

We run the regression

Y_i = ₀ + ₁ T_i + ₂ F_i + ₃ T_i*F_i + u_i.

b. (2 points) In the research just described, are we undertaking a randomized control trial (RCT)? Explain.

c. (2 points) Is this dataset best described as a "single cross section dataset," a "repeated cross section dataset," a "panel dataset," or a "time series dataset?" Explain.One sentence should suffice.

d. (2 points) Which parameter in the model described above is the "difference-in-differences" estimate of the effect of the agricultural extension program? (No explanation is necessary. Simply indicate which coefficient.)

e. (2 points) The name "difference-in-differences" estimation indicates that we are comparing some kind of "difference" across two groups. What kind of difference are we comparing and for what two groups? Please be sure to tailor your answer to this application.

f. (4 points) What must be true for this difference-in-differences estimation to give us an unbiased estimate of the agricultural extension program's impact?

g. (3 points) What is the intuition regarding the way difference-in-differences estimation allows us to get an unbiased estimate of the causal effect of the program on farm income, if the condition you stated in part f is true?