ENGLISH | 中文版
PicoPebble is a lightweight distributed machine learning training framework for beginners. It uses MPI to pass parameters and update gradients between multiple machines, and it also allows for training on a single machine. The features currently supported by PicoPebble include:
- Synchronous training
- Asynchronous training
- Data parallelism
- Pipeline model parallelism
There are also several features in the development pipeline:
- Tensor model parallelism
- Passing parameters through Gloo
- Disaster recovery
Currently, PicoPebble relies on MPI for parameter synchronization, so you need to install OpenMPI. Please note that you should not install both OpenMPI and MPICH at the same time.
sudo yum install openmpi-devel -y
sudo apt install openmpi-bin libopenmpi-dev
sudo pacman -S openmpi
docker build -t picopebble -f Dockerfile .
# for podman
# podman build -t picopebble -f Dockerfile .`
# ./build_run.sh <node num>
./build_run.sh 1
./build_run.sh 3