Replication files for Intergenerational Income Mobility in France: A Comparative and Geographic Analysis
Authors: Gustave Kenedi and Louis Sirugue
Journal: Journal of Public Economics
Year: 2023
Should you notice any error(s) in our code or issues with our analysis, please reach out at gustavekenedi@gmail.com
or louis.sirugue@psemail.eu
, we'll be happy to discuss it with you and make any corrections if necessary.
This repository contains all the code files necessary to replicate our paper as well as the paper's figures. We provide R
markdown or quarto notebooks, as well as the data outputs and code used to construct the figures in the paper.
You can download R
here and RStudio
here. Once all set up, the only package you need to manually install is librarian
.
Three sets of codes and/or outputs are provided:
-
intergen_fr/
: the analysis to compute estimates of intergenerational mobility in France using the Permanent Demographic Sample (EDP). -
psid_validation/
: the analysis to validate the two-sample two-stage least squares (TSTSLS) methodology using the Panel Study of Income Dynamics (PSID). -
figures/
:-
code.rmd
: code to re-create (almost) all the figures in the paper (main + appendix). The figures in Appendix B PSID Validation Exercise are created directly within the code of the PSID validation exercise (see below). -
figures_paper/
: pdf figures from the paper (main text + appendices). -
other_data/
: some other data files necessary to create the figures (shapefiles, data from Chetty et al. (2020), etc.). -
out/
: folder to store the figures when you runcode.rmd
. This avoids overwriting the original figures.
-
Data used: for this analysis we use the 2017 version of the Permanent Demographic Sample (EDP). This dataset can be accessed from the French Secure Data Access Center (CASD). Additional information on the EDP can be found here, and all the documentation in French is available here. Note that new versions of the EDP have become available since we started working on this project and we have not checked whether our code is compatible with newer versions.
The folder intergen_fr/
contains two subfolders:
-
code/
: all cleaning and analysis codes.-
0_convert_src_data.sas
: convert raw SAS data files to csv. -
1_data_cleaning.Rmd
: data cleaning code. -
2_article.Rmd
: all results from the paper. -
3_bootstrap.Rmd
and4_master_bootstrap.R
: for bootstrap standard errors.
-
-
figures_data/
: underlying data for the figures in the paper.
Data used: for this analysis we use the Panel Study of Income Dynamics (PSID). This dataset can be accessed from the University of Michigan's Institute for Social Research. We use the raw version of the PSID which can be downloaded here. We downloaded:
- every family file (
Main Study/Family Files
) from 1968 to 2019 (without the wealth supplements which we do not use), and - the cross-year individual file (
Main Study/Cross-year Individual
) 1968-2019 (now 2021).
We also make use of an Excel version of the PSID cross-year index avaiable in /psid_validation/data/psid.xlsx
. Note that the a potentially updated version of this file can be found here. To replicate our analysis we recommend using our file since we have not checked whether the code runs with the (potentially) newer version. Note that you'll need to change the paths in qmd
files manually.
In order to use the PSID data in R
, we ran the code in code/sas_to_csv.R
. This code converts the sas
files to csv
. It unfortunately takes quite a while to run (a couple of hours if I recall correctly), but it should work. If you know of a simpler way of doing this please let us know. Once all this is done, you should be able to open the PSID data files in R
.
The folder psid_validation/
contains three subfolder:
code/
: cleaning and analysis codescode_lib/
: any functions created for the analysisdata/
: only contains the very usefulpsid.xlsx
excel file which contains a cross-year index of variables which we make extensive use of.
Within code/
you need to run the files in the following order:
-
income_data.qmd
: generates a dataset containing the various income definitions we use. -
psid_sample.qmd
: generates datasets containing the parent and chil samples. -
psid_variables.qmd
: creates the parent characteristics necessary for the TSTSLS method.
After that, you should be able to run both psid_results.qmd
which contains the code for the results of the TSTSLS validation exercise, and psid_desc_stats.qmd
which contains the code for the descriptive statistics.
-
Figures themselves: all the figures from the paper (main text and appendices) can be found in
figures/figures_paper/
. -
Figure codes: the code to create the figures can be found in
code.rmd
. The best way to use this code while avoiding issues with paths is to double-click onreplication_files.Rproj
to open this project inRStudio
, and then simply runningcode.rmd
and all the figures will be saved infigures/out/
.