How to implement distributed deep learning on small master-slave architecture through data parallelism approach?

Question

How to implement distributed deep learning on small master-slave architecture through data parallelism approach?

Opened this issue 3 years ago · 0 comments

I am a beginner and I would like to deploy the distributed deep learning model followed by Hadoop on a toy example. Like I want to use three Personals computers (PC), one would be work as a parameter server and the other two would be work as worker machines. Here I initially want to configure the Hadoop over the three machines (do not know exactly how it would be done on the three machines). Then distribute the data into pieces over the two worker machines via the Parameter server machine for training. Suppose I have 10 GB of data, so 5 GB would be a shift to the first worker personal computer and the other 5 GB of data would be allocated to the second PC. Then I would like to apply the data parallelism model synchronously on the data set. What would be the steps to implement distributed deep learning system on these small network machines?