/datascience-personal-templates

Repository to track my templates for personal projects

Primary LanguageMakefile

Data Science Project Template

This template has been built after reading the Medium article by khuyetran1401. It would be much simpler to just fork its repo but I prefer to build it by myself to understand each component. It has been built to be easy and quick to use.

For 'industrial' or more 'business' projects, I still prefer tools like Kedro.

Features and Roadmap

βœ… Automatically build repository structure for DS personal projects

βœ… Create and Build an environment using conda

πŸ”² Run Tests automatically

πŸ”² Manage configuration variables for data pipelines and projects

βœ… Enforce hints and quality code

πŸ”² Automatically Document Code

πŸ”² Automate Code

βœ… DVC for Data Management and Experiment Management

To Do

  • Automate setup of dvc repo and .gitignore

Tools used

  • Conda: Package, dependency and environment management
  • pre-commit: framework for managing and maintaining multi-language pre-commit hooks.

Template Structure

.
β”œβ”€β”€ config                       # Project configuration files
β”‚   β”œβ”€β”€environment.yml           # Environment file for conda
β”œβ”€β”€ data                         # Local project data (not committed to version control)
β”‚   β”œβ”€β”€ 01_raw                   # Raw immutable data
β”‚   β”œβ”€β”€ 02_primary               # Domain model data
β”‚   β”œβ”€β”€ 03_feature               # Model features
β”‚   β”œβ”€β”€ 04_model_input           # Often called 'master tables'
β”‚   β”œβ”€β”€ 05_model_output          # Data generated by model runs
β”‚   β”œβ”€β”€ 06_reporting             # Ad hoc descriptive cuts
β”œβ”€β”€ docs                         # Project documentation
β”œβ”€β”€ models                       # Project configuration files
β”œβ”€β”€ notebooks                    # Project related Jupyter notebooks (used for experimental code before moving code to src)
β”œβ”€β”€ README.md                    # Project README
└── src                          # Project source code
    └── main.py

How to use this template

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/radema/datascience-personal-templates

Activate the new environment

conda activate {{cookiecutter.environment_name}}

Execute setup in terminal

cd {{cookiecutter.repository-name}}; make setup

Resources and references