/data-making-guidelines

documenting the DataMade ETL workflow

Primary LanguageHTML

Making Data, the DataMade Way

This is DataMade's guide to extracting, transforming and loading (ETL) data using Make, a common command line utility.

ETL refers to the general process of:

  1. taking raw source data (Extract)
  2. doing some stuff to get the data in shape, possibly involving intermediate derived files (Transform)
  3. & ultimately ending up with final output in a usable form (for Loading into something that consumes the data - be it an app, a system, a visualization, etc.)

For enthralling insights on how to get from source data to final output, all while minimizing future headaches - read on!

Principles

  1. Treat inputs as immutable - don't modify source data directly
  2. Be able to deterministically produce the final data with one command
  3. Write as little custom code as possible
  4. Use standard tools whenever possible
  5. Source data should be under version control

The Guide

  1. Make & Makefile Overview
  2. Why Use Make/Makefiles?
  3. Makefile 101
  4. Makefile 201 - Some Fancy Things Built Into Make
  5. ETL Styleguide
  6. Makefile Best Practices
  7. Variables
  8. Processors
  9. Standard Toolkit
  10. ETL Workflow Directory Structure

Code examples

Further reading