This template has been built after reading the Medium article by khuyetran1401. It would be much simpler to just fork its repo but I prefer to build it by myself to understand each component. It has been built to be easy and quick to use.
For 'industrial' or more 'business' projects, I still prefer tools like Kedro.
β Automatically build repository structure for DS personal projects
β Create and Build an environment using conda
π² Run Tests automatically
π² Manage configuration variables for data pipelines and projects
β Enforce hints and quality code
π² Automatically Document Code
π² Automate Code
β DVC for Data Management and Experiment Management
- Automate setup of dvc repo and .gitignore
- Conda: Package, dependency and environment management
- pre-commit: framework for managing and maintaining multi-language pre-commit hooks.
.
βββ config # Project configuration files
β βββenvironment.yml # Environment file for conda
βββ data # Local project data (not committed to version control)
β βββ 01_raw # Raw immutable data
β βββ 02_primary # Domain model data
β βββ 03_feature # Model features
β βββ 04_model_input # Often called 'master tables'
β βββ 05_model_output # Data generated by model runs
β βββ 06_reporting # Ad hoc descriptive cuts
βββ docs # Project documentation
βββ models # Project configuration files
βββ notebooks # Project related Jupyter notebooks (used for experimental code before moving code to src)
βββ README.md # Project README
βββ src # Project source code
βββ main.py
Install Cookiecutter:
pip install cookiecutter
Create a project based on the template:
cookiecutter https://github.com/radema/datascience-personal-templates
Activate the new environment
conda activate {{cookiecutter.environment_name}}
Execute setup in terminal
cd {{cookiecutter.repository-name}}; make setup