cookiecutter-research-template
Warning
This cookiecutter is currently unmaintained and development will not be resumed in the near future. A similar and maintained template can be found at https://github.com/hmgaudecker/econ-project-templates.
I started developing https://github.com/pytask-dev/pytask which is a replacement for the build system used in this template and which will also be used in the econ-project-template soon. So, I advise that you check out both projects if you are interested.
Introduction
This repository lays out the structure for a reproducible research project based on the Waf framework.
It is derived from https://github.com/hmgaudecker/econ-project-templates and the authors of this project deserve all the credit for the implementation of Waf as a framework for reproducible research. My contribution is to add several helpers around the project which are common in software engineering and should help researchers to write better code.
Installation
This is a Cookiecutter template. Install it by running
$ pip install cookiecutter
After that, you can quickly set up a new research project with this template by typing
$ cookiecutter https://github.com/tobiasraabe/cookiecutter-research-template.git
Answer all the prompts and a folder <project-name>
is created in your current
directory.
One of the last prompts is about whether the template should create a conda environment from the pre-configured environment.yml. If that is not what you want, stick to the default answer. You can fetch it later by running
$ conda env create -f environment.yml -n <env-name>
At last, type
$ conda activate <env-name> # to activate the environment.
$ git init # to initialize a git repository.
$ pre-commit install # to install pre-commit hooks.
Happy research!
Features
The template offers several features:
- Automatic dependency updates with pyup
- Connect your Github repository with https://pyup.io and you get automatic PRs if one of your dependency is outdated.
- Automatic testing with Azure-Pipelines
- Connect your Github repository with https://azure.microsoft.com/de-de/services/devops/pipelines/ and the master branch and PRs are automatically tested and you can see the results online.
- Testing with tox
Tox is a framework which allows you to define tests and run them in isolated environments. To run all tests defined in
tox.ini
, hit$ tox
- Quality checks on commits with pre-commit
- pre-commit runs checks before every commit and aborts the process if a violation is found.
- Code Formatting with black and reorder-python-imports
Both tools will quickly improve the code quality of your project. Just run
$ pre-commit run black reorder-python-imports --all-files (-a).
- Linting
Linting is the process of validating the syntax in code or documentation files. This template offers three ways to lint your project.
flake8
and its extensions check your Python files for potential errors, violations of naming conventions,TODO
directives, etc.. To check your documentation files and other.rst
files in your project, usedoc8
andrestructuredtext-lint
. All three tests are included as pre-commits, but you can also run them with$ pre-commit run flake8 doc8 restructuredtext-lint -a
To test whether the documentation is built successfully, run
$ tox -e sphinx.
- Customizing matplotlib
- If you are tired to set the same old options like
figsize=(12, 8)
for every graph, you are lucky. There is a solution calledmatplotlibrc
(predefined template). This is a configuration file for matplotlib which lets you define the your personal defaults. The file resides insrc/figures/matplotlibrc
and is copied over tobld
as this is the root directory of the Python interpreter running your project. Thematplotlibrc
and its settings are automatically picked up. (More information.) - Downloading data for the project
- Data cannot be committed to the repository because the files are big and changing or
because of confidentiality.
prepare_data_for_project.py
offers a way to download files, resume downloads and validate downloaded files. Add the file toFILES
with the filename on the disk as the key and the url as the first element of the list and the hash value as the second. Hashes are needed to validate that the downloaded file is identical the source. This seems unnecessarily nit-picky, but it takes ages to recognize that your source files are corrupt when you are debugging your project and look for typical mistakes. - Cleaning the project
clean.py
offers a way to clean your project from artifacts and unused files. Basically, it is a wrapper around git clean, but with more convenience.$ python clean.py
performs a dry-run, so you can be sure that only unnecessary files are deleted. Then, run
$ python clean.py --force
to delete the files.
- Visualization of the DAG
A graphic of the DAG is compiled at the end of the Waf build process and serves as a nice picture of the complexity of the project (a little bit of bragging is ok :wink:) or allows for visual debugging.
- Others
- Waf Tips and Trick
- Writing documentation with Jupyter notebooks (nbsphinx )
- Auxiliary scripts for figures in
src/figures/auxiliaries.py
. - Anaconda on Windows