Minimal industrial level implementation (multi-node, DDP, FSDP) GPT training.
Using the official Ubuntu Docker image with version tag 22.04 (see ubuntu Docker Official Image), which we can pull directly:
docker pull ubuntu:22.04docker run -it -d --net host --runtime=nvidia --gpus all --name simple_ubuntu ubuntu:22.04 bash
docker exec -it simple_ubuntu /bin/bashFor limiting the available visible CUDA devices within the container we can modify the flag --gpus to just one 1 or specify the device accordingly (Specialized Configurations with Docker: GPU Enumeration). For example, using one single GPU in the container:
docker run -it -d --net host --runtime=nvidia --gpus 1 --name simple_ubuntu ubuntu:22.04 bash