The GSLab Template is a minimal working demonstration of the tools and organization used by projects in the GSLab. We use SCons and a few custom builders to execute scripts and track dependencies in a portable and flexible manner.
You'll need the following to run the template. Homebrew for Mac and Linuxbrew for Linux make this easier.
- Windows
cmd.exe
, Mac OS Xbash
, or Linuxbash
. - Python 2.7.X and pip for Windows, Mac or Linux.
- git for version control.
- git-lfs for versioning large files.
- You'll need both git and git-lfs to clone the repository.
- LyX (with instructions for LaTeX)
- Add LyX to your PATH for Windows, Mac, and Linux.
- The beamer theme
metropolis
. This is part of MikTeX since Dec 2014.
- Open a shell, clone the repository, and navigate to its root.
git lfs clone https://github.com/gslab-econ/template.git YourProjectName cd YourProjectName
- Install Python dependencies (GSLab Python v4.1.0, PyYAML, numpy, and matplotlib).
# Store package names in this text file. pip install -r config/requirements.txt
- Unzip the scons package.
unzip config/scons.zip -d config/scons # Do manually on Windows
- Navigate to the
analysis
orpaper_slides
subdirectory.cd analysis
- You're ready to go. We'll prompt you to enter any necessary information and store it in
config_user.yaml
as your scripts run.- To build everything in the subdirectory that has been modified or with dependencies in the subdirectory that have been modified.
python run.py
- To build everything in a single directory of targets that has been modified and all of their dependencies that have been modified.
python run.py build/path/to/directory
- To build a single target that has been modified and all of its dependencies that have been modified.
python run.py build/path/to/file.txt
- To build everything in the subdirectory that has been modified or with dependencies in the subdirectory that have been modified.
If you want to create a repository with the same structure as this template you can fork it. If you want a repository without any of our git history, follow these instructions.
- Create an empty repository in GitHub and clone it.
- Copy the contents of this template into the empty repository. Make sure to exclude the
.git
folder, but include the.gitattributes
and.gitignore
files. - Commit the changes and push to the new repository (Do not commit
config_user.yaml
).
You can accomplish the last two bullets by running these (Bash) commands.
# Get new repository's name
echo "Enter the empty repository's name."
read REPO_NAME
# Clone new repository and template
git clone https://github.com/gslab-econ/$REPO_NAME
git lfs clone https://github.com/gslab-econ/template.git
# Copy template's contents to new repository and clean up
rm -rf template/.git
cp -a template/ $REPO_NAME/
rm -rf template
# Commit and push new repository
cd $REPO_NAME
git add .
git commit -m "Initialized repository with the gslab-econ template."
git push
Each user is allowed to have different local specifications: We don't put any restrictions on where you keep large files, what you call your executables, or how you manage shared directories. We do need to find these things, and that's what config_user.yaml
is for. Each user maintains an unversioned YAML file with these sorts of specifications. Each script uses its associated YAML-parsing module to read these specifications each time the script is run.
If you try to build a directory without config_user.yaml
, we'll copy a template to your current working directory. You can always switch back to this template by deleting your current config_user.yaml
and rerunning.
There's no "default" for config_user.yaml
because it depends on system specifications and user preferences. Three things we do recommend keeping in config_user.yaml
are the names of your executables, the location of a SCons cache directory, and the location of a release directory. These fields don't have to be specified if you're not using them, and we'll prompt you for their values at runtime if you've forgotten to specify them and they're necessary.
The config_global.yaml
tracks paths, specifications, variables, and software checks that are constant across users. We treat this file in the same manner as config_user.yaml
, except that we do version config_global.yaml
.
One important function of config_global.yaml
is that it tracks the version of gslab-python you expect your users to have installed.
We are agnostic about how you incorporate external data into the template. There's no custom builder for these assets, by design. Our suggestions:
-
When a large dataset is stored locally,
config_user.yaml
can include an entry specifying the user-specific path to that dataset. The key of the entry should be constant across users and documented in the top-level readme of the repository. -
When a large dataset is stored externally, there are a few options.
- The top-level readme can specify manual download and storage instructions. This is simple, easy to customize, and unlikely to cause errors during a SCons build. It does, however, require each user to successfully download the same dataset, perhaps in an unstructured manner.
- The download can be incorporated into the SCons build. We either execute a program to transfer data (e.g.,
rsync
orrclone
) using our "Anything builder" or from within a script executed by one of our other custom builders. These methods have the benefits of automation and dependency tracking, but they can introduce idiosyncratic errors if the download steps are prone to failure. - Regardless of the download method, the path to the dataset should be added to
config_global.yaml
and.gitignore
if it is stored within the repository and toconfig_user.yaml
if it is stored elsewhere.
-
Any pointers to directories you store under the
input
key in either configuration yaml file will have their contents checked and recorded at runtime. We complete the check using ourrecord_dir
function and store its contents inrelease/state_of_input.log
.
We recommend that you manually copy the desired files or directories from the release
directory in analysis
to a directory called input
in paper_slides
. We uncouple these top-level subdirectories to compartmentalize the research process. A busy coauthor can build paper_slides
in a reproducible manner without wrangling data or repeating a lengthy analysis.
There are lighter-weight methods to connect these subdirectories, but they may make your project more prone to errors.
- Create a symlink between
analysis/release
andpaper_slides/input
(not platform-independent). - Point to
analysis/release
from theconfig_global.yaml
inpaper_slides
(hard to parse YAML in LyX and LaTeX).
You can recouple the SCons subdirectories by installing the output from analysis
into paper_slides
Add the following line to the SConstruct in analysis
env.Install('../paper_slides/input', '#release/')
You'll also need to give SCons permission to build targets outside the analysis
directory; so run
python run.py ../paper_slides
We have custom builders for Python, R, Stata, and MATLAB. They all use the same syntax. You'll need to add the builder to the SCons environment by uncommenting its definition in the SConstruct. Also check that its executable has been added to your PATH and recorded in config_user.yaml
.
See analysis/source/prepare_data/
for sample scripts in each language. To run one of the sample scripts, uncomment its block in analysis/source/prepare_data/SConscript
and its builder in analysis/SConstruct
. Note that all sample scripts produce the same output, so only one block is allowed to run for each SCons build. If you uncomment the block for one software, you need to comment out the blocks for all others.
- If you're using R, make sure you've installed its dependencies.
Rscript config/config_r.r
- If you're using Stata, make sure the executable is stored in
config_user.yaml
with the keystata_executable
and that you've installed its dependencies.
statamp -e config/config_stata.do // Stata executable may be different
- If you're using Matlab, make sure it's been added to your PATH and that you have installed the Matlab YAML parser following the instructions here.
Yes. We also have the LaTeX builder. See paper_slides/source/paper
for a demonstration.
We support additional software through our "Anything builder." It let's you execute shell commands as a build step using the syntax, logging, and error management of our other custom builders. We turn your commands into a SCons node at runtime, and SCons will track your dependencies and build the node if necessary. SCons will always execute your commands using your system's shell, and you can use this functionality to incorporate arbitrary steps into your build.
An example using standard Bash commands is in analysis/source/prepare_data/SConscript
and analysis/source/prepare_data/build_data.sh
. This code won't run on Windows.
You bet. All of our custom builders accept "command line style" arguments with the same method. Enumerate the arguments in a list and pass them to the builder through the CL_ARG
keyword argument, exactly the same way you specify sources and targets. We'll format this list, and scons
will pass its contents to the script at runtime. You can reference these arguments when writing a script using the standard practice for its language.
Each of our custom builders produces a log of its process in the same directory as the first of its targets. Each log is named sconscript.log
by default, and you can insert custom text between sconscript
and .log
by passing it as a string through the builder's log_ext
keyword argument. It's similar to the way that you specify sources and targets, except that the log_ext
argument must be a string. You should specify the log_ext
argument for builders that produce logs in the same directory, otherwise the default sconscript.log
will be overwritten by each builder.
After all the steps in the build are completed, we'll comb through the directory and look for for any files named sconscript*.log
. These logs will be concatenated—with the earliest completed ones first and all logs with errors on top. We'll store this concatenated log in release/sconstruct.log
.
Yes, our custom tool allows you to release to GitHub and a local destination specified in config_user.yaml
. A new release can be transfered to a remote manually (e.g., using rsync
or rclone
) or automatically by specifying a local destination that's synced to a remote (e.g., a Dropbox directory).
Every file intended for release should be added to the release
directory. Files not intended for release to GitHub should be added to .gitignore
. Our tool will transfer everything in release
to the local destination and create a GitHub release with all the versioned files—those not added to .gitignore
—in release
.
Our custom tool creates a local release of the SCons subdirectory in which its run but a GitHub release of the entire repository. We therefore suggest you prepend the name of the SCons subdirectory to the title of the release.
Our protocol is to keep these files in a designated subdirectory named release/lg
. By default, release/lg
is in .gitignore
. When you use our custom release tool, release/lg
will be included in the local destination release but not pushed to GitHub.
That's fine, but you'll need version 2.4.0 or later. Just switch python run.py
to scons
. Everything else stays the same. You can also skip the unzip step of the Quick Start.
Be aware that the formatting of the cache changed in version 2.5.0 of SCons. A cache can get messy and fall out of sync if collaborators use a mix of older and newer versions.
The version of SCons packaged with this repository is scons-local-3.0.1. Our run.py
scripts are just wrappers around the scons.py
included in scons-local. These files exist only to execute SCons, and you shouldn't need to edit them frequently, if at all. The structure of your SCons tree should be entirely specified in your SConstruct and SConscript files.
Required packages for Python, R, and Stata should be added to their configuration scripts under config
. If you require the installation of many packages for another language, we suggest you set up a new configuration script in the same location. Other requirements can be added a la carte to the Prerequisites section of this readme.
See here.
The MIT License (MIT)
Copyright (c) 2017 Matthew Gentzkow, Jesse Shapiro
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.