- Open and project folder in VS code then follow below command -
echo "# dvc_tutoral" >> README.md
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/USER_NAME/REPO_NAME.git
git push -u origin main
touch .gitignore
Content of the gitignore can be found from reference repository
conda create -n dvc-ml python=3.9 -y
conda activate dvc-ml
- To use src folder as package, we have to create a setup.py as below:
touch setup.py
- Paste the below content in the setup.py file and make the necessary changes as per your user ID-
from setuptools import setup
with open("README.md", "r", encoding="utf-8") as f:
long_description = f.read()
setup(
name="src",
version="0.0.1",
author="USER_NAME",
description="A small package for dvc ml pipeline demo",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/rohit-chandra/dvc_tutorial",
author_email="rohitv.chandra@gmail.com",
packages=["src"],
python_requires=">=3.9",
install_requires=[
'dvc',
'pandas',
'scikit-learn'
]
)
- To verify whether src is working as package or not, run the below command and you should see the src package along with it's version in the list:
pip list
touch requirements.txt
pip install -r requirements.txt
content of requirements.txt - Refer the reference repository
dvc init
mkdir -p src/utils config
touch config/config.yml
content of config.yml -
data_source: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
artifacts:
artifacts_dir: artifacts
raw_local_dir: raw_local_dir
raw_local_file: data.csv
touch src/stage_01_load_save.py src/utils/all_utils.py
content of both these files can be refererd from the reference given
touch dvc.yaml
content of dvc.yaml file -
stages:
load_data:
cmd: python src/stage_01_load_save.py --config=config/config.yaml
deps:
- src/stage_01_load_save.py
- src/utils/all_utils.py
- config/config.yaml
outs:
- artifacts/raw_local_dir/data.csv
dvc repo
git add .
git commit -m "stage 01 added"
git push origin main