模型系列-双重差分|DID 直观介绍
This article is taken from my reading notes.And the contents are based on sildes wrriten by Jeff Wooldridge and video shared by Douglas McKee, a senior lecturer in the Cornell Economics department.While the note is published in WeChat Official Accounts called Think_Lab
In the basic setting, outcomes are observed for two groups for two,time periods. One of the groups is exposed to a treatment in the second period but not in the first period. The second group is not exposed to the treatment during either period.Structure can apply to repeated cross sections or panel data.(Jeff Wooldridge,2011)
Suppose Sao Paulo (Brazil) institutes a free lunch program in elementary school in 2009. There are many reasons to expect students to perform better in school if they are guaranteed a free meal. But how large such effects might be ?Suppose also that Brazilian fifth graders take a standardized math test at the end of every year.So how we evaluate the effects of this program(free lunch) on test scores?
One way to evaluate the program would be to compare test scores of kids in Sao Paul in 2010 with the scores of Sao Paul in 2008 before the program was implemented. This difference is certainly partially due to the program.We can call the difference D1.Why we say D1 is partially due to the program? Suppose there was an important international soccer tournament druing the week of the exam in 2008 but not in 2010.So the tournament might also influence the differnce in the test scores between the two periods.So what we have is that D1 is both the program effects and what we are going to call the trend what else might be happening at the same time.
Suppose we also observe test scores in 2008 and 2010 in Rio another large city in Brazil.If we are willing to assume that the difference across time in Rio is reflective of what would have happened in Sao Paulo then we can use the difference(D2) as an approximation of what the trend is.Now we get our difference-in-difference "D1-D2"
DID1.pngWhen is diff-in-diff(DID) useful?
- You want to evalueate a program or treatment
- You have treatment and control groups
- You observe them before and after
BUT
- Treatment is not random
- Other things were happening while the program was in effect
- You can't control for all the potential confounders
Key Asuumption
Trend in control group approximates what would have happend in the treatment group in the absence of the treatment
Diff-in-diff data needs
You need data on
- The treatment group and a control group
- Covering pre and post treatment(i.e. program intervention)
Structure
- Aggregate data or
- Pooled cross-section or
- Longitudinal data
DID3.png
We are saying we believe test scores are determined by this regression model.If we take the conditional expected value of y (E[y|x]),we get this linear combination and the error term drops out.
The difference across time in the control group is going to be β1.And the difference across time in the treatment group is going to be β1+β3.Between the two is β3(DID estimate)
DID4.pngReference
1.DIFFERENCE-IN-DIFFERENCES ESTIMATION,Jeff Wooldridge,Michigan State University,LABOUR Lectures, EIEF,October 18-19, 2011