| Installation | Features | Documentation |

cookiecutter-research-template

Warning

This cookiecutter is currently unmaintained and development will not be resumed in the near future. A similar and maintained template can be found at https://github.com/hmgaudecker/econ-project-templates.

I started developing https://github.com/pytask-dev/pytask which is a replacement for the build system used in this template and which will also be used in the econ-project-template soon. So, I advise that you check out both projects if you are interested.

Introduction

This repository lays out the structure for a reproducible research project based on the Waf framework.

It is derived from https://github.com/hmgaudecker/econ-project-templates and the authors of this project deserve all the credit for the implementation of Waf as a framework for reproducible research. My contribution is to add several helpers around the project which are common in software engineering and should help researchers to write better code.

Installation

This is a Cookiecutter template. Install it by running

$ pip install cookiecutter

After that, you can quickly set up a new research project with this template by typing

$ cookiecutter https://github.com/tobiasraabe/cookiecutter-research-template.git

Answer all the prompts and a folder <project-name> is created in your current directory.

One of the last prompts is about whether the template should create a conda environment from the pre-configured environment.yml. If that is not what you want, stick to the default answer. You can fetch it later by running

$ conda env create -f environment.yml -n <env-name>

At last, type

$ conda activate <env-name>     # to activate the environment.
$ git init                      # to initialize a git repository.
$ pre-commit install            # to install pre-commit hooks.

Happy research!

Features

The template offers several features:

Automatic dependency updates with pyup: Connect your Github repository with https://pyup.io and you get automatic PRs if one of your dependency is outdated.
Automatic testing with Azure-Pipelines: Connect your Github repository with https://azure.microsoft.com/de-de/services/devops/pipelines/ and the master branch and PRs are automatically tested and you can see the results online.

Testing with tox

Tox is a framework which allows you to define tests and run them in isolated environments. To run all tests defined in tox.ini, hit

$ tox

Quality checks on commits with pre-commit

pre-commit runs checks before every commit and aborts the process if a violation is found.

Code Formatting with black and reorder-python-imports

Both tools will quickly improve the code quality of your project. Just run

$ pre-commit run black reorder-python-imports --all-files (-a).

Linting

Linting is the process of validating the syntax in code or documentation files. This template offers three ways to lint your project.

flake8 and its extensions check your Python files for potential errors, violations of naming conventions, TODO directives, etc.. To check your documentation files and other .rst files in your project, use doc8 and restructuredtext-lint. All three tests are included as pre-commits, but you can also run them with

$ pre-commit run flake8 doc8 restructuredtext-lint -a

To test whether the documentation is built successfully, run

$ tox -e sphinx.

Customizing matplotlib

If you are tired to set the same old options like figsize=(12, 8) for every graph, you are lucky. There is a solution called matplotlibrc (predefined template). This is a configuration file for matplotlib which lets you define the your personal defaults. The file resides in src/figures/matplotlibrc and is copied over to bld as this is the root directory of the Python interpreter running your project. The matplotlibrc and its settings are automatically picked up. (More information.)

Downloading data for the project

Data cannot be committed to the repository because the files are big and changing or because of confidentiality. prepare_data_for_project.py offers a way to download files, resume downloads and validate downloaded files. Add the file to FILES with the filename on the disk as the key and the url as the first element of the list and the hash value as the second. Hashes are needed to validate that the downloaded file is identical the source. This seems unnecessarily nit-picky, but it takes ages to recognize that your source files are corrupt when you are debugging your project and look for typical mistakes.

Cleaning the project

clean.py offers a way to clean your project from artifacts and unused files. Basically, it is a wrapper around git clean, but with more convenience.

$ python clean.py

performs a dry-run, so you can be sure that only unnecessary files are deleted. Then, run

$ python clean.py --force

to delete the files.

Visualization of the DAG

A graphic of the DAG is compiled at the end of the Waf build process and serves as a nice picture of the complexity of the project (a little bit of bragging is ok :wink:) or allows for visual debugging.

Others

Waf Tips and Trick
Writing documentation with Jupyter notebooks (nbsphinx )
Auxiliary scripts for figures in src/figures/auxiliaries.py.
Anaconda on Windows