- ml-pytorch is no longer being devleoped and will be removed. Please switch to docker-base-ml.
You should use this image if you
- need to run PyTorch on an Nvidia CUDA 11.8.0 GPU device
- want to use fish shell
- don't want to re-install a virtualenv everytime
- want to run large ML training job on a remote server via SSH
- want to deploy a base image to a cloud computing provider (eg: runpod.io)
You shouldn't use this image
- if you want to run PyTorch on Apple Silicon
- in an otherwise unsecured environment (because it adds a
sudo
-privileged user with plaintext password) - if you want to use bash shell
- if you want to run jupyter notebook
- if you need a pinned python virtualenv
This image has the following installed
git
,rsync
,ssh
, and other Linux utilitiesfish
is set to default for the userneo
(see below)python3
andpip
python3.9
andpip
- updated
conda
base environment - Virtualfish is available as
vf
- A
vf
environment calledpytorch
which containsnumpy
,pandas
,matplotlib
,seaborn
scikit-learn
,scipy
torch
,torchtext
transformers
,datasets
,sentencepiece
wandb
,wbutils
ipython
,jupyter
pytest
,coverage
, and more
Use pytorch
virtualenv as follows
# You should be in the (default) fish shell (not bash)
vf activate pytorch
ipython
# Run your python code
# Exit ipython
vf deactivate
This image creates a sudo
-privileged user for you. You should be able to
do everything (including installing packages using apt-get
) as this user
without having to become root
.
Key | Value |
---|---|
Username | neo |
Password | agentsmith |
docker pull ankurio/docker-base-ml:latest
docker run -it ankurio/docker-base-ml:latest /usr/bin/fish
# You need to login even though the image is public
docker login docker.pkg.github.com
Username: # type your GitHub username
Password: # type your GitHub password (not token)
# Login Succeeded <- Success!
# Now, you can pull the image
docker pull ghcr.io/ankur-gupta/docker-base-ml:latest
docker run -it ghcr.io/ankur-gupta/docker-base-ml /usr/bin/fish
Alternatively, you can clone the repository and build it yourself. It will take ~10 minutes to build the first time.
# Clone the repository
git clone git@github.com:ankur-gupta/docker-base-ml.git
# From $REPO_ROOT
cd $REPO_ROOT
docker build . -t docker-base-ml
docker run -it docker-base-ml /usr/bin/fish
# This command assumes that you pulled the image from DockerHub. Modify if you got the image
# in other ways.
docker run -it ankurio/docker-base-ml:latest /usr/bin/fish
# Execute these within the docker container
# Use the installed `pytorch` virtualenv
neo@5e50bd3d637b ~> vf activate pytorch
neo@5e50bd3d637b ~> (pytorch) ipython
# Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
# Type 'copyright', 'credits' or 'license' for more information
# IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch
In [2]: torch.zeros(3)
Out[2]: tensor([0., 0., 0.])
# Run your pytorch code here
# ...
# ...
# Exit ipython
exit # or press Control+D
# Deactivate virtualenv
vf deactivate
# Exit the docker container
exit # or press Control+D
A RUN echo "print something"
line in Dockerfile
will be useful if you use "plain progress"
docker build . -t docker-base-ml --progress=plain
This does not delete the docker cache. From this StackOverflow post.
# Delete all containers - do this first!
docker rm $(docker ps -a -q)
# Delete all images
docker rmi $(docker images -q)
If you encounter this type of error,
ERROR: failed to solve: failed to register layer: write /usr/local/cuda-11.8/targets/sbsa-linux/lib/libcublasLt_static.a: no space left on device
it means that you may need to free disk space. You can do it using these commands.
docker system prune -a # this will delete the cache
# WARNING! This will remove:
# - all stopped containers
# - all networks not used by at least one container
# - all images without at least one container associated to them
# - all build cache
# Are you sure you want to continue? [y/N] y
# Deleted build cache objects:
# kukqetqe618pmdgf5vz1npyn6
# ...
# gdculirrh0mr1g2x2qa7llunk
# Total reclaimed space: 21.19GB
docker volume prune
# WARNING! This will remove anonymous local volumes not used by at least one container.
# Are you sure you want to continue? [y/N] y
# Total reclaimed space: 0B
docker network prune
# WARNING! This will remove all custom networks not used by at least one container.
# Are you sure you want to continue? [y/N] y
You can check docker's space usage like this
docker system df
# TYPE TOTAL ACTIVE SIZE RECLAIMABLE
# Images 2 1 7.192GB 7.192GB (100%)
# Containers 1 0 1.621kB 1.621kB (100%)
# Local Volumes 0 0 0B 0B
# Build Cache 88 3 11.83kB 10.54kB