This repository contains the data and code underlying the paper "The Comparative Advantage of Cities" by Donald Davis and Jonathan Dingel. This replication package was prepared by Jonathan Dingel, with assistance from Dylan Clarke, Luis Costa, Antonio Miscio, and Shirley Yarin.
Our project is organized as a series of tasks.
The main project directory contains 9 folders that represent 9 tasks.
Each task folder contains three folders: input
, code
, output
.
A task's output is used as an input by one or more downstream tasks.
This graph depicts the input-output relationships between tasks.
We use Unix's make
utility to automate this workflow.
After downloading this replication package (and installing the relevant software), you can reproduce the figures and tables appearing in the published paper and the online appendix simply by typing make
at the command line.
The project's tasks are implemented via Stata code and GNU/Linux shell scripts. In particular, we used Stata 15 and GNU bash version 4.2.46(2). The taskflow structure employs symbolic links.
Note to Mac OS X users:
The code presumes that Stata scripts can be run from Terminal via the command stata-se
.
Please follow the instructions for Running Stata from the Terminal.
- Download (or clone) this repository by clicking the green
Clone or download
button above. Uncompress the ZIP file into a working directory on your cluster or local machine. - Download the IPUMS micro data from https://usa.ipums.org/usa/. You will need to register as an IPUMS-USA user in order to download the public-use micro data from the 1980 and 2000 Census of Population releases. See details below.
- (Optional) If you will be running your jobs using Slurm on a computing cluster, edit the file
commoncode/code/run.sbatch
to specify the#SBATCH --partition=
name.
- The files
CAC_IPUMS_1980_Codebook.txt
andCAC_IPUMS_2000_Codebook.txt
ininitialdata/input
contain lists of the variables to download. - Do not extract any extra variables beyond the ones listed. The scripts
CAC_PREP_IPUMS_1980.do
andCAC_PREP_IPUMS_2000.do
withininitialdata/code
make assumptions about the contents of these files. - Rename the
.dat
files that you download from IPUMS asIPUMS_1980.dat
andIPUMS_2000.dat
and place them in theinitialdata/input
folder.
Typing make
in the working directory at the Linux/MacOSX command line will execute all the project code.
- It is best to replicate the project using the
make
approach described above. Nonetheless, it is also possible to produce the results task-by-task in the order depicted in the flow chart. If all upstream tasks have been completed, you can complete a task by navigating to the task'scode
directory and typingmake
. - The
downloaddata
andinstall_packages
tasks require an internet connection. - The task
permutation_tests
is pretty slow. It involves 30 (parallel) jobs that take about 12 hours each.