This project "AWS Neuron Reference for NeMo Megatron" includes modified versions of the open-source packages NeMo and Apex that have been adapted for use with AWS Neuron and AWS EC2 Trn1 instances.
Please refer to the neuronx-nemo-megatron GPT-3 pretraining tutorial for instructions on how to use the code in this repository.
The following instructions have been verified to run on
763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.12.0-ubuntu20.04
These instructions should be periodically updated for new docker images. Latest docker images can be found in https://github.com/aws/deep-learning-containers/blob/master/available_images.md
- Authenticate for AWS ECR repository access to latest docker image
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com
- Build the docker image
docker build -t neuron-nemo-megatron:dev .
- You can now use this docker image to run neuron-nemo-megatron code, for e.g.
cd /workspace/nemo/nemo/examples/nlp/language_modeling && ./test_llama.sh