Replication package for "Expatriate Managers: Effects on Firm Performance"


  • Koren, Miklós (Corresponding author)
  • Telegdy, Álmos
  • Závecz, Gergő


Data Availability and Provenance Statements

Statement about Rights

  • I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
  • I certify that the author(s) of the manuscript have documented permission to redistribute/publish the data contained within this replication package. Appropriate permission are documented in the LICENSE.txt file.

License for Data

The data are licensed under a Creative Commons/CC-BY-NC license. See LICENSE.txt for details.

Summary of Availability

  • Some data cannot be made publicly available.
  • Confidential data used in this paper and not provided as part of the public replication package will be preserved for 5 years after publication, in accordance with journal policies.

Details on each Data Source

Dataset list

Data file Source Notes Provided
data/raw/lbd.dta LBD Confidential No
data/raw/terra.dta IPUMS Terra As per terms of use Yes
data/derived/regression_input.dta All listed Combines multiple data sources, serves as input for Table 2, 3 and Figure 5. Yes

Computational requirements

Software Requirements

All the Stata packages can be installed by running code/util/ or make install.

The package includes a Makefile, which can be used to run all the code in the proper order. This requires Linux or Mac.

Controlled Randomness

  • Random seed is set at line _____ of program ______
  • No Pseudo random generator is used in the analysis described here.

Memory, Runtime, Storage Requirements

  • <10 minutes
  • 10-60 minutes
  • 1-2 hours
  • 2-8 hours
  • 8-24 hours
  • 1-3 days
  • 3-14 days
  • > 14 days

Approximate storage space needed:

  • < 25 MBytes

  • 25 MB - 250 MB

  • 250 MB - 2 GB

  • 2 GB - 25 GB

  • 25 GB - 250 GB

  • > 250 GB

  • Not feasible to run on a desktop machine, as described below.


The code was last run on a 4-core Intel-based laptop with MacOS version 10.14.4 with 200GB of free space.

Portions of the code were last run on a 32-core Intel server with 1024 GB of RAM, 12 TB of fast local storage. Computation took 734 hours.

Portions of the code were last run on a 12-node AWS R3 cluster, consuming 20,000 core-hours, with 2TB of attached storage.

Description of programs/code

  • Programs in programs/01_dataprep will extract and reformat all datasets referenced above. The file programs/01_dataprep/ will run them all.
  • Programs in programs/02_analysis generate all tables and figures in the main body of the article. The program programs/02_analysis/ will run them all. Each program called from identifies the table or figure it creates (e.g., Output files are called appropriate names (table5.tex, figure12.png) and should be easy to correlate with the manuscript.
  • Programs in programs/03_appendix will generate all tables and figures in the online appendix. The program programs/03_appendix/ will run them all.
  • Ado files have been stored in programs/ado and the files set the ADO directories appropriately.
  • The program programs/ will populate the programs/ado directory with updated ado packages, but for purposes of exact reproduction, this is not needed. The file programs/00_setup.log identifies the versions as they were last updated.
  • The program programs/ contains parameters used by all programs, including a random seed. Note that the random seed is set once for each of the two sequences (in 02_analysis and 03_appendix). If running in any order other than the one outlined below, your results may differ.

Instructions to Replicators

  • Edit programs/ to adjust the default path
  • Run programs/ once on a new system to set up the working environment.
  • Download the data files referenced above. Each should be stored in the prepared subdirectories of data/, in the format that you download them in. Do not unzip. Scripts are provided in each directory to download the public-use files. Confidential data files requested as part of your FSRDC project will appear in the /data folder. No further action is needed on the replicator's part.
  • Run programs/ to run all steps in sequence.


  • programs/ will create all output directories, install needed ado packages.
    • If wishing to update the ado packages used by this archive, change the parameter update_ado to yes. However, this is not needed to successfully reproduce the manuscript tables.
  • programs/01_dataprep:
    • These programs were last run at various times in 2018.
    • Order does not matter, all programs can be run in parallel, if needed.
    • A programs/01_dataprep/ will run them all in sequence, which should take about 2 hours.
  • programs/02_analysis/
    • If running programs individually, note that ORDER IS IMPORTANT.
    • The programs were last run top to bottom on July 4, 2019.
  • programs/03_appendix/ The programs were last run top to bottom on July 4, 2019.
  • Figure 1: The figure can be reproduced using the data provided in the folder “2_data/data_map”, and ArcGIS Desktop (Version 10.7.1) by following these (manual) instructions:
    • Create a new map document in ArcGIS ArcMap, browse to the folder “2_data/data_map” in the “Catalog”, with files "provinceborders.shp", "lakes.shp", and "cities.shp".
    • Drop the files listed above onto the new map, creating three separate layers. Order them with "lakes" in the top layer and "cities" in the bottom layer.
    • Right-click on the cities file, in properties choose the variable "health"... (more details)

The provided code reproduces:

  • All numbers provided in text in the paper
  • All tables and figures in the paper
  • Selected tables and figures in the paper, as explained and justified below.
Figure/Table # Program Line Number Output file Note
Table 1 02_analysis/ summarystats.csv
Table 2 02_analysis/ 15 table2.csv
Table 3 02_analysis/ 145 table3.csv
Figure 1 n.a. (no data) Source: Herodus (2011)
Figure 2 02_analysis/ figure2.png
Figure 3 02_analysis/ figure-robustness.png Requires confidential data


Steven Ruggles, Steven M. Manson, Tracy A. Kugler, David A. Haynes II, David C. Van Riper, and Maryia Bakhtsiyarava. 2018. "IPUMS Terra: Integrated Data on Population and Environment: Version 2 [dataset]." Minneapolis, MN: Minnesota Population Center, IPUMS.

Department of Elementary and Secondary Education (DESE), 2019. "Student outcomes database [dataset]" Massachusetts Department of Elementary and Secondary Education (DESE). Accessed January 15, 2019.

U.S. Bureau of Economic Analysis (BEA). 2016. “Table 30: "Economic Profile by County, 1969-2016.” (accessed Sept 1, 2017).

Inglehart, R., C. Haerpfer, A. Moreno, C. Welzel, K. Kizilova, J. Diez-Medrano, M. Lagos, P. Norris, E. Ponarin & B. Puranen et al. (eds.). 2014. World Values Survey: Round Six - Country-Pooled Datafile Version: Madrid: JD Systems Institute.


Some content on this page was copied from Hindawi. Other content was adapted from Fort (2016), Supplementary data, with the author's permission.