/TensorHive

Lightweight computing resource management tool for distributed TensorFlow executed in clusters with SSH connectivity, based on KernelHive manager framework

Primary LanguagePython

TensorHive

Resource management tool for executing distributed TensorFlow based on KernelHive manager libraries.

GPU monitoring example:

  1. Install KernelHive python libraries:

pip install kernelhive

  1. Run the monitoring daemon:

python gpu_monitoring.py

  1. Add nodes that should be monitored:

cd scripts ./add_node.sh