This research was done to demonstrate distributed CNN training on a cluster. The target platforms: Jetson TK1, x86 CPU.
- To prepare dataset:
python dataset.py
- To train the network
python train.py
- To run distributed training on the cluster (one host)
./cluster_emulated.sh
- To run distributed training on the cluster
./cluster_distributed.sh
There can be some issues if you try to execute code in docker.
apt-get remove python-mpi4py
apt-get install libopenmpi-dev
pip install mpi4py