This section provides a “lightning-quick” guide for early access of Satori users such as:
- How to optain your login credentials
- How to connect in the Satori cluster via SSH/SCP/HTTPS
- Installing Python environment (Anaconda) on your login profile
- Installing Open-CE or/and WMLCE optimized Machine Learning and Deep Learning libraries and frameworks on your login profile
- Learn how you can start your training jobs.Examples of training jobs submision and management in the Satori cluster.
- How to manage distributed deep learning in TensorFlow and Pytorch
- Are you training large models that cannot be fit into GPU memory? Have a look how you can enable Large Model Support (LMS) in your TensorFLow or Pytorch python scripts.
- Example of Snap Machine Learning (SnapML/pai4sk)
- Additional trainings (video format) for CUDA UNIFIED MEMORY, GPUDirect, LSF Workload Manager, POWER9 SMT4 etc
- Public Datasets for Machine Learning and Deep Learning on Satori that can be used day 1 in your deep learning projects. Will save your time and our Internet bandwith :)
- Troubleshooting
- You don’t find what you where looking for? have a look to conda cheet sheet and WMLCE extended documentation here before asking for help.
- Still need help ? Email orcd-help-satori@mit.edu