/dingel_projecttemplate

J. Dingel's template for research projects

Primary LanguageTeX

Project template

This repository contains a project template. I intend this repo as the common starting point for research projects. It can also be used to onboard new co-authors and research assistants. Download the repository and compile logbook/logbook.tex to get started.

The following is the first page of logbook.pdf converted to Markdown.

Research infrastructure

This chapter is intended to introduce everyone to technology or tools that are integral to our workflow. It describes our research infrastructure in terms of three types of issues:

  • Code: organize it, write it, run it, track it

  • Collaboration: assigning tasks, sharing code, reviewing code, reporting results

  • Computing: geeky details

We organize the project as a series of tasks, so our organization of code and data takes a task-based perspective. After writing code, we automate its execution via make. We track our code (and the rest of the project) using Git, a version control system. Collaboration occurs via Asana task assignments, GitHub/BitBucket code reviews, and logbook entries that share research designs and results.

Use a good text editor like SublimeText, Atom, or VSCode to write code, slides, and papers. Word processors aren't text editors. Your text editor should, at minimum, offer you syntax highlighting, tab autocomplete, and multiple selection.

Our approach assumes that you'll use Unix/Linux/MacOSX. Plain-text social science lives at the *nix command line. Gentzkow and Shapiro: "The command line is our means of implementing tools." Here are four intros to the Linux shell:

Getting started at the command line can be a little overwhelming, but it's well worth it. While you can use GUI apps to interact with most of our workflow (e.g., SourceTree for version control), automation of some key parts relies on shell scripts. See logbook entry A.9 for a haphazard collection of shell tips.

Beyond *nix, the rest of the research workflow is language-agnostic: it applies to everything from Stata to Julia. In fact, the task-based approach naturally facilitates using different languages for different tasks.

I have five criteria in mind when evaluating a research workflow:

  • Replicability: Can the research results be reproduced starting from the raw data?

  • Portability: If I install a fresh copy of the project on a new computer, what are the startup costs before I can run the code?

  • Modularity: Can a coauthor work on a task using the provided inputs without having to look upstream at the code that produced those inputs?

  • Dependencies: In the event of a data update, how do you know which pieces of code need to be run (and in what order)?

  • History: If results have changed, can I discern the relevant code changes and their authors?

After reading the rest of this chapter, you should be able to say how our workflow answers each of these questions.