/singa-ps

Primary LanguagePython

Prepare mxnet library:
1. CUDA Dependencies:
   (a) Download and install CUDA toolkit;
   (b) Download and install cuDNN.
2. Download mxnet and compile mxnet into shared libarary 'libmxnet.so'
   (a) $wget http://us.mirrors.quenda.co/apache/incubator/mxnet/1.6.0/apache-mxnet-src-1.6.0-incubating.tar.gz
   (b) $tar xvzf apache-mxnet-src-1.6.0-incubating.tar.gz
   (c) $cd apache-mxnet-src-1.6.0-incubating
   (d) $cp make/config.mk .
   (e) Configure the compile options at config.mk, e.g., the CUDA OPTIONS
   (f) $make -j8 USE_BLAS=openblas
   (g) Wait for a long time, e.g., 3-4 hours
3. Copy the built 'libmxnet.so' from '/lib' to '/python/mxnet'.

*Note that:
  (a)If you want to enable GPUs, the mxnet should compile with GPU options. Please refer to https://mxnet.apache.org/get_started/ubuntu_setup
  (b)The CUDA-version and cuDNN-version of singa and mxnet should be set to the same.

How to configure singa-ps?
1. copy the folder "python/mxnet" from mxnet source directory into the working directory of singa.
2. copy "singa_kvstore.py" and "SingaOpt.py" into the working directory of singa.
3. add a function "from_numpy(ndarray, device_id=0, zero_copy=True)" at the end of "/mxnet/ndarray/ndarray.py". Please see https://github.com/willzhoupan/singa-ps/blob/master/mxnet/ndarray/ndarray.py#L5056

How to use singa-ps?  
1. import singa_kvstore;

2. create kvstore for ps;
   For example:
   #------------------------start example------------------------------------------------------------------
   kv_type = 'dist_sync' # set synchronization mode, e.g., 'dist_async','dist_sync'
   kv = singa_kvstore.create_kvstore(kv_type,
                                     'SingaSGD', # set other options, like 'sgd','adam', can use mxnet optimizer
                                     lr=0.005, 
                                     momentum=0.9, 
                                     weight_decay=1e-5)
   #------------------------end example--------------------------------------------------------------------

3. use kvstore to update parameters after the loss is computed;
   For example:
   #------------------------start example------------------------------------------------------------------
   singa_kvstore.backward_and_update(kv,loss)
   #------------------------end example--------------------------------------------------------------------

(mnist_cnn.py is a detailed example)

How to launch the script?
* The launch of the script is the same as mxnet, for example,
  (1) two worker and one server at the same machine: 
      $python3 ./launch/launch.py -n 2 -s 1 --launcher local python3 mnist_cnn.py
  (2) four worker and two server at different machines:
      $python3 ./launch/launch.py -n 4 -s 2 -H hostfile --launcher ssh python3 mnist_cnn.py