"# dvc-ML-demo-AIOps"
https://github.com/c17hawke/dvc-ML-demo-AIOps
- open and project folder in VS code then follow below command -
echo "# dvc-ML-demo-AIOps" >> README.md
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/USER_NAME/REPO_NAME.git
git push -u origin main
touch .gitignore
content of the gitignore can be found from reference repository
create a empty git repository
conda create -p ./myenv python==3.10
conda activate ./myenv
mkdir -p src/utils
touch src/__init__.py
touch src/utils/__init__.py
touch params.yaml dvc.yaml
mkdir config
touch setup.py
paste the below content in the setup.py file and make the necessary changes as per your user ID-
from setuptools import setup
with open("README.md", "r", encoding="utf-8") as f:
long_description = f.read()
setup(
name="src",
version="0.0.1",
author="sobz2019",
description="A small package for dvc ml pipeline demo",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/sobz2019/dvc-ML-demo-AIOps",
author_email="sobz87@gmail.com",
packages=["src"],
python_requires=">=3.10",
install_requires=[
'dvc',
'pandas',
'scikit-learn'
]
)
touch .gitignore
touch requirements.txt
dvc
pandas
scikit-learn
#local packages -
-e .
pip install -r requirements.txt
dvc init
touch config/config.yml
content of config.yml -
data_source: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
artifacts:
artifacts_dir: artifacts
raw_local_dir: raw_local_dir
raw_local_file: data.csv
touch src/stage_01_load_save.py src/utils/all_utils.py
content of both these files can be refererd from the corresponding folder
touch dvc.yaml
content of dvc.yaml file -
stages:
load_data:
cmd: python src/stage_01_load_save.py --config=config/config.yaml
deps:
- src/stage_01_load_save.py
- src/utils/all_utils.py
- config/config.yaml
outs:
- artifacts/raw_local_dir/data.csv
dvc repro
git add .
git commit -m "stage 01 added"
git push origin main
dvc add data.csv
This command will: • Move data.csv to the .dvc directory. • Create a .dvc file (data.csv.dvc) that contains the tracking information for your data file.
Commit the changes to Git, including the DVC file and the .gitignore updates.
git add data.csv.dvc .gitignore
git commit -m "Add data.csv to DVC"
To ensure that your data is backed up and can be retrieved later, configure a remote storage (this could be S3, Google Drive, SSH, etc.). Here’s an example using an S3 bucket:
dvc remote add -d myremote s3://mybucket/path
You can specify a directory in your project to act as the local remote storage. Let's say you want to use a directory named .dvcstore within your project folder
dvc remote add -d localremote .dvcstore
dvc push
When you make changes to data.csv, you need to re-add it to DVC and commit the changes.
dvc add data.csv
git add data.csv.dvc
git commit -m "Update data.csv with new changes"
dvc push
If you want to go back to a previous version of your data, you can use Git to checkout the corresponding commit and then use DVC to retrieve the data file.
git log # Find the commit hash you want to revert to
git checkout <commit-hash>
dvc checkout