Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Project homepage

Requirements to use the cookiecutter template:

Python 2.7 or 3.5
Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:

$ pip install cookiecutter

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:

cookiecutter git@github.com:Vanova/cookiecutter-data-science.git

The resulting directory structure

The directory structure of your new project looks like this:

    ├── LICENSE
    ├── Makefile           <- Makefile with commands like `make data` or `make train`
    ├── README.md          <- The top-level README for developers using this project.
    ├── data
    │   ├── external       <- Data from third party sources.
    │   ├── interim        <- Intermediate data that has been transformed.
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
    │
    ├── docs               <- A default Sphinx project; see sphinx-doc.org for details
    │
    ├── envs               <- Environment settings files: Anaconda, Dockerfile
    ├── experiments
    │   ├── logs
    │   ├── params         <- Training settings, hyperparameters
    │   ├── submissions    <- Evaluation model results, submission to the challenge leaderboard
    │   ├── system         <- Trained and serialized models, model predictions, or model summaries
    │   └── experiment.py  <- Main file to run the particular experiment, it is based on the framework in 'src' folder
    │
    ├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
    │                         the creator's initials, and a short `-` delimited description, e.g.
    │                         `1.0-jqp-initial-data-exploration`.
    │
    ├── references         <- Manuals, literature and all other explanatory materials.
    │
    ├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
    │   └── figures        <- Generated graphics and figures to be used in reporting
    │
    ├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
    │                         generated with `pip freeze > requirements.txt`
    │
    ├── src                <- Main source code for use in this project. Framework structure.
    |
    ├── test_environment.py
    |
    ├── tests              <- Test framework code from 'src' folder
    │   └── data           <- data for testing
    │
    └── tox.ini            <- tox file with settings for running tox; see tox.testrun.org

Installing development requirements

pip install -r requirements.txt

Running the tests

py.test tests