This is a simple script for spinning up new Azure Pipelines agents to run DeepSpeed integration tests. This script installs prerequisites, registers the worker, and begins listening for jobs.
$DEEPSPEED_PAT
must store a personal authentication token (PAT) configured for DeepSpeed's GPU testing pool.sudo
priviledges are required to install the agent prerequisite software. You may be prompted for a password once at the beginning of this script execution.
To spin up a worker, simply run:
DEEPSPEED_PAT=mytoken ./prep_test_node.sh
Note: the worker will stop once this script is killed. For continued execution, we
strongly recommend you run this script in a tmux
or screen
environment.
The testing agent sets up in /tmp/deepspeed-testing/
by default. You can change the base directory
by setting the environment variable $DEEPSPEED_TEST_BASE
at the time of running:
DEEPSPEED_TEST_BASE=/my/fast/dir DEEPSPEED_PAT=mytoken ./prep_test_node.sh
Note: we recommend you base the testing agent on fast local storage.
DeepSpeed's model tests expect training data to be found under
/data/Megatron-LM
/data/BingBertSquad
We don't currently provide a way to configure the model test training data location.