A Data Project Template for Building Robust, Reproducible and Maintainable Predictive Solutions
guane enterprises
About •
Features •
Life Cycle •
Contribute •
Authors •
License
ds-ml-project-template
is a Python library for training, testing and reporting of the FTL-Pricing predictive models. This Python
library is designed to training and generate the machine and deep learning models that predicts base transportation cost of FTL modality in United States & Canada.
ds-ml-project-template
is built on Python 3.9
with pandas, numpy and scikit-learn, matplotlib, seaborn, plotly among others, to preprocess the data, build the machine learning models, and visualize the results.
For development, the library use:
- Formatting with black
- Import sorting with isort
- Linting with flake8
- Git hooks that run all the above with pre-commit
- Testing with pytest
As a proposal for the data science life cycle, OSEMN is mainly proposed. Standing for Obtain, Scrub, Explore, Model, and iNterpret, OSEMN is a five-phase life cycle.
Other good option is Microsoft TDSP: The Team Data Science Process combines many modern agile practices with the life cycle. It has five steps: Business Understanding, Data Acquisition and Understanding, Modeling, Deployment, and Customer Acceptance.
The important thing is that if you think they should be combined and form their own life cycle, feel free to do so.
First, make sure that before enabling pipenv, you must have Python 3.9
installed. If it does not correspond to the version you have installed, you can create a conda environment with:
# Create and activate python 3.9 virutal environment
$ conda create -n py39 python=3.9
$ conda activate py39
Now, you can managament the project dependencies with Pipenv
. To create de virtual environment and install all dependencies follow:
# Install pipx if pipenv and cookiecutter are not installed
$ python3 -m pip install pipx
$ python3 -m pipx ensurepath
# Install pipenv using pipx
$ pipx install pipenv
# Create pipenv virtual environment
$ pipenv shell
# Install dependencies
$ pipenv install --dev
Once the dependencies are installed, we need to notify Jupyter
of this new Python
environment by creating a kernel:
$ ipython kernel install --user --name KERNEL_NAME
Finally, before making any changes to the library, be sure to review the GitFlow guide and make any changes outside of the master
branch.
👤 guane Data Science and Machine Learning (DS&ML) Team
Copyright 2023 © guane enterprises. All rights reserved.