A containerized Python framework for a better Data X development workflow. Where X = Science, Engineering, Analytics, etc.
These are requirements for your local machine, ideally Linux OS:
In your local machine:
-
install the
ms-vscode-remote.remote-containers
extension locally, -
follow the instructions in the docker docs to ensure that $USER has root access to docker.
The next time you open up VS Code in the project directory, VS Code should already be running in the greenhouse container, as specified in the manifest .devcontainer.json
.
Notice that VS Code will run intilization commands that may take some time to process.
VS Code will already include the ms-python.python
extension, without the need to install it in your own local machine.
Installation (Debian):
sudo apt-get update
sudo apt-get install build-essential
Installation (Debian):
sudo apt-get update
sudo apt-get install python3
Installation (Debian):
sudo apt-get update
sudo apt-get install python3-pip
pip3 install pre-commit
pre-commit install
pre-commit migrate-config
pre-commit autoupdate
Or, simply run make install-requirements
.
- Dockerfile to define container
- Docker-compose with services
- VS Code integration with Docker
- Makefile with definitions of commands, e.g.
make release
- Git hooks
- linting
- testing (pytest)
- Python Template for the Machine Learning Pipeline
- Reading Data
- Data Cleansing
- Feature Engineering
- Exploratory Data Analysis
- Model Development
- Performance Monitoring (logs)
- Model Interpretation (SHAP)
- Model Versioning, e.g. MLFlow, DVC, CML
- API