Resource management tool for executing distributed TensorFlow based on KernelHive manager libraries.
GPU monitoring example:
- Install KernelHive python libraries:
pip install kernelhive
- Run the monitoring daemon:
python gpu_monitoring.py
- Add nodes that should be monitored:
cd scripts ./add_node.sh