A simple demo DVC package to automate the ML pipelines and make ML models shareable and reproducible.
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
- open and project folder in VS code then follow below command -
echo "# dvc-ML-demo-AIOps" >> README.md
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/USER_NAME/REPO_NAME.git
git push -u origin main
touch .gitignore
content of the gitignore can be found from reference repository
conda create -n dvc-ml python=3.7 -y
conda activate dvc-ml
touch setup.py
paste the below content in the setup.py file and make the necessary changes as per your user ID-
from setuptools import setup
with open("README.md", "r", encoding="utf-8") as f:
long_description = f.read()
setup(
name="src",
version="0.0.1",
author="USER_NAME",
description="A small package for dvc ml pipeline demo",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/Ryzxxl/dvc-mlops-project",
author_email="rakshitraina1234@gmail.com",
packages=["src"],
python_requires=">=3.7",
install_requires=[
'dvc',
'pandas',
'scikit-learn'
]
)
touch requirements.txt
pip install -r requirements.txt
content of requirements.txt - Refer the reference repository
dvc init
mkdir -p src/utils config
touch config/config.yml
content of config.yml -
data_source: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
artifacts:
artifacts_dir: artifacts
raw_local_dir: raw_local_dir
raw_local_file: data.csv
touch src/stage_01_load_save.py src/utils/all_utils.py
content of both these files can be refererd from the reference given
touch dvc.yaml
content of dvc.yaml file -
stages:
load_data:
cmd: python src/stage_01_load_save.py --config=config/config.yaml
deps:
- src/stage_01_load_save.py
- src/utils/all_utils.py
- config/config.yaml
outs:
- artifacts/raw_local_dir/data.csv
dvc repo
git add .
git commit -m "stage 01 added"
git push origin main