This repository provides templates to prepare for the Ray Train user tests. During testing, we'll help you scale machine learning workloads to cloud clusters with multi-node, multi-GPU training.
These instructions will help you prepare the environment on Anyscale Workspace. Please fork this repository and follow the instructions below. Please send your forked github repository URL back to us before user testing.
- You have a runnable Model training script on your local machines.
- You are training with publicly available datasets and models.
- Python >= 3.8
Please put your data preparation logics in prepare_data.sh
.
# Example: download MNIST dataset
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
You can skip this step if you are loading the dataset in your training script, e.g.
from datasets import load_dataset
ds = load_dataset("glue", "cola")
Put all the required Python packages in requirements.txt
. Our team will help you install these dependencies before testing.
# requirements.txt
pytorch-lightning==2.0.3
transformers==4.31.0
Finally, put your training scripts into the ./src
folder. Please include any header files, metadata, and configuration files in this folder.
You are also encouraged to provide a simple guide for running your code in ./src/train.sh
, so that we can easily check if your code can run without frictions with dependencies.
You have completed all the steps. Remember to send your forked GitHub repository URL back to us before user testing! We are looking forward to work with you to unlease the power of Ray!