/Distributed-TensorFlow-Example

About a cluster of TensorFlow servers, and how to distribute a computation graph across that cluster

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Distributed-TensorFlow-Example

About a cluster of TensorFlow servers, and how to distribute a computation graph across that cluster

Requirements

  • Python 3.5.2
  • TensorFlow >= 1.3.0

How to run

Make sure the all nodes/hosts (ip address and ports) are at the same Local Area Network (LAN). If the IP address is not correct, there will be error tensorflow.python.framework.errors_impl.UnknownError: Could not start gRPC server

Parameter Server (ps):

CUDA_VISIBLE_DEVICES='' python distributed.py --ps_hosts=192.168.1.203:10000 --worker_hosts=192.168.1.202:10001 --job_name=ps --task_index=0

Worker Server:

CUDA_VISIBLE_DEVICES='' python distributed.py --ps_hosts=192.168.1.203:10000 --worker_hosts=192.168.1.202:10001 --job_name=worker --task_index=0

Reference

Distributed TensorFlow

Deploy - TensorFlow

How to run TensorFlow on Hadoop

thewintersun/distributeTensorflowExample