proyecto_xd

Tools used in this project

Project structure

.
├── config                      
│   ├── main.yaml                   # Main configuration file
│   ├── model                       # Configurations for training model
│   │   ├── model1.yaml             # First variation of parameters to train model
│   │   └── model2.yaml             # Second variation of parameters to train model
│   └── process                     # Configurations for processing data
│       ├── process1.yaml           # First variation of parameters to process data
│       └── process2.yaml           # Second variation of parameters to process data
├── data            
│   ├── final                       # data after training the model
│   ├── processed                   # data after processing
│   ├── raw                         # raw data
│   └── raw.dvc                     # DVC file of data/raw
├── docs                            # documentation for your project
├── dvc.yaml                        # DVC pipeline
├── .flake8                         # configuration for flake8 - a Python formatter tool
├── .gitignore                      # ignore files that cannot commit to Git
├── Makefile                        # store useful commands to set up the environment
├── models                          # store models
├── notebooks                       # store notebooks
├── .pre-commit-config.yaml         # configurations for pre-commit
├── pyproject.toml                  # Configure black
├── requirements.txt                # requirements for pip
├── README.md                       # describe your project
├── src                             # store source code
│   ├── __init__.py                 # make src a Python module 
│   ├── process.py                  # process data before training model
│   └── train_model.py              # train model
└── tests                           # store tests
    ├── __init__.py                 # make tests a Python module 
    ├── test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

Set up the environment

Create and activate virtual environment

python3 -m venv venv
source venv/bin/activate

Install dependencies from requirements.txt:

pip install -r requirements.txt

Set up Git:

make setup_git

Install new packages

To install new PyPI packages, run:

pip install <package-name>

Run the entire pipeline

To run the entire pipeline, type:

dvc repo

Version your data

Read this article on how to use DVC to version your data.

Basically, you start with setting up a remote storage. The remote storage is where your data is stored. You can store your data on DagsHub, Google Drive, Amazon S3, Azure Blob Storage, Google Cloud Storage, Aliyun OSS, SSH, HDFS, and HTTP.

dvc remote add -d remote <REMOTE-URL>

Commit the config file:

git commit .dvc/config -m "Configure remote storage"

Push the data to remote storage:

dvc push 

Add and push all changes to Git:

git add .
git commit -m 'commit-message'
git push origin <branch>

Auto-generate API documentation

To auto-generate API document for your project, run:

make docs

proyecto_xd