/Distributed-Machine-Learning-with-Python

Distributed Machine Learning with Python, published by Packt

Primary LanguagePythonMIT LicenseMIT

Distributed Machine Learning with Python

Distributed Machine Learning with Python

This is the code repository for Distributed Machine Learning with Python, published by Packt.

Accelerating model training and serving with distributed systems

What is this book about?

Reducing time cost in machine learning leads to a shorter waiting time for model training and a faster model updating cycle. Distributed machine learning enables machine learning practitioners to shorten model training and inference time by orders of magnitude

This book covers the following exciting features:

  • Deploy distributed model training and serving pipelines
  • Get to grips with the advanced features in TensorFlow and PyTorch
  • Mitigate system bottlenecks during in-parallel model training and serving
  • Discover the latest techniques on top of classical parallelism paradigm
  • Explore advanced features in Megatron-LM and Mesh-TensorFlow
  • Use state-of-the-art hardware such as NVLink, NVSwitch, and GPUs

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

# Connect to API through subscription key and endpoint
subscription_key = "<your-subscription-key>"
endpoint = "https://<your-cognitive-service>.cognitiveservices.
azure.com/"
# Authenticate
credential = AzureKeyCredential(subscription_key)
cog_client = TextAnalyticsClient(endpoint=endpoint,
credential=credential)

Following is what you need for this book: This book is for data scientists, machine learning engineers, and ML practitioners in both academia and industry. A fundamental understanding of machine learning concepts and working knowledge of Python programming is assumed. Prior experience implementing ML/DL models with TensorFlow or PyTorch will be beneficial. You'll find this book useful if you are interested in using distributed systems to boost machine learning model training and serving speed.

Software and Hardware List

Chapter Software required OS required
1-12 PyTorch Windows, Mac OS X, and Linux (Any)
1-12 TensorFlow Windows, Mac OS X, and Linux (Any)
1-12 Python Windows, Mac OS X, and Linux (Any)
CUDA/C
NVprofiler/Nsight

We assume you have Linux/Ubuntu as your operating system. We assume you use NVIDIA GPUs and have installed the proper NVIDIA driver as well. We also assume you have basic knowledge about machine learning in general and are familiar with popular deep learning models.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Guanhua Wang is a final-year computer science Ph.D. student in the RISELab at UC Berkeley, advised by Professor Ion Stoica. His research lies primarily in the machine learning systems area, including fast collective communication, efficient in-parallel model training, and real-time model serving. His research has gained lots of attention from both academia and industry. He was invited to give talks to top-tier universities (MIT, Stanford, CMU, Princeton) and big tech companies (Facebook/Meta, Microsoft). He received his master's degree from HKUST and a bachelor's degree from Southeast University in China. He has also done some cool research on wireless networks. He likes playing soccer and has run multiple half-marathons in the Bay Area of California.