/data-science-template

Template for a data science project

Primary LanguagePython

View Article

Data Science Cookie Cutter for Prefect

Why Should You Use This Template?

This template is the result of my years refining the best way to structure a data science project so that it is reproducible and maintainable.

This template allows you to:

✅ Create a readable structure for your project

✅ Automatically run tests when committing your code

✅ Enforce type hints at runtime

✅ Check issues in your code before committing

✅ Efficiently manage the dependencies in your project

✅ Create short and readable commands for repeatable tasks

✅ Rerun only modified components of a pipeline

✅ Automatically document your code

✅ Observe and automate your code

Tools used in this project

Project structure

.
├── data            
│   ├── final                       # data after training the model
│   ├── processed                   # data after processing
│   ├── raw                         # raw data
├── docs                            # documentation for your project
├── .flake8                         # configuration for flake8 - a Python formatter tool
├── .gitignore                      # ignore files that cannot commit to Git
├── Makefile                        # store useful commands to set up the environment
├── models                          # store models
├── notebooks                       # store notebooks
├── .pre-commit-config.yaml         # configurations for pre-commit
├── pyproject.toml                  # dependencies for poetry
├── README.md                       # describe your project
├── src                             # store source code
│   ├── __init__.py                 # make src a Python module
│   ├── config.py                   # store configs 
│   ├── process.py                  # process data before training model
│   ├── run_notebook.py             # run notebook
│   └── train_model.py              # train model
└── tests                           # store tests
    ├── __init__.py                 # make tests a Python module 
    ├── test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

How to use this project

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/khuyentran1401/data-science-template

Resources