This archive contains the shareable data and code used in the reanalysis of Duflo (2001), "Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment."
Most of the data used in the original study and the reanalysis cannot be shared because of restrictions in the licenses from Badan Pusat Statistik (BPS). The NBER has licensed 1995 intercensal (SUPAS) data and makes it available to NBER affiliates. At this writing, BPS does not make the 1995 SUPAS and other older data sets available to researchers. IPUMS freely distirbutes a large subset of the 1995 SUPAS data, which is missing many smaller regencies, or perhaps the most lightly weighted observations in general. The code posted here is written to work with the IPUMS subset as well, but produces different results from it.
The reanalysis also uses the 2005 SUPAS, 2010 SAKERNAS, and 2013-14 SUSENAS survey data sets. The 2005 SUPAS data at IPUMS is used for the first. The other two must be bought from the BPS: the district-level 2010 SAKERNAS and the 2013 and 2014 district-level SUSENAS. To save money, buy only variables UMUR, JK, WEIGHT, B1P01, B1P02, B5P1A, B5P6B, B5P12A, and B5P12B from SAKERNAS; and UMUR, JK, FWT_TAHUN, B5_TL1, B5_TL2, B5R15, B5R29, and B5R28B from SUSENAS (and HB to copy Hsiao in restricting to heads of household).
The "Regency-level vars" files contain figures on population, school attendance, planned school construction, and water and sanitation spending. The Duflo (2001) versions of the variables, which have been used in many studies, are here copied from the public data archive of Ashraf et al. (2020). The new versions carry the suffix "new". Images of the government documents they were reconstructed from are in the "Printed sources" folder.
Regencies and municipality boundaries in Indonesia have changed over time, mostly through subdivision, occasionally through merger. This complicates linking regency-level data from the 1971 census and mid-1970s presidential directives to the follow-ups in 1995, 2005, 2010, and 2013-14. IPUMS helpfully provides shapefiles that modern database and GIS software can use to make the linkages. The concordances folder contains concordances linking the 1995 coding to the 2005 and 2010-14 codings. The 1970s data are manually coded with respect to 1995. Notes in "Baseline variable reconstruction.xlsx" in the "Regency-level vars" folder document complications in this coding, including a few cases where the original and new differ.
"Duflo 2001.do" is a Stata do file that generates all results.