/make_step

Step-oriented GNU Makefile template for reproducible data analysis

Primary LanguageMakefile

make_step

Step-oriented GNU Makefile for reproducible data analysis.

Usage

make

Motivation

When doing scientific data analysis,

  1. We execute mutually dependent analysis processes,
  2. We often re-execute each process through trial and error. Scientists check the output figures, edit source files, and re-execute necessary processes, and
  3. Many files/data are passed between processes. These files are often renamed/added/removed.

With these characteristics, we often forget to execute some necessary processes and fail to reproduce results.

Only considering points 1 and 2, we imagine that we can write a simple Makefile with explicit dependencies between files and processes to solve this problem (as in Reference 1). However, due to point 3, this straightforward approach will fail because we need to include comprehensive dependencies to the Makefile and update it frequently, which we will soon start to neglect.

Therefore, I propose a Makefile that uses dependency between processes instead of dependency between files. This Makefile checks the timestamp of each source files and execute necessary processes only. It saves last-execute timestamp of each step in a hidden directory.

Another post

References

  1. http://zmjones.com/make/