/forked-Data-Labeling-in-Machine-Learning-with-Python

Data Labeling in Machine Learning with Python, by Packt Publishing

Primary LanguageJupyter NotebookMIT LicenseMIT

Data Labeling in Machine Learning with Python

Data Labeling in Machine Learning with Python

This is the code repository for Data Labeling in Machine Learning with Python, published by Packt.

Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

What is this book about?

Data labeling is the invisible hand that guides the power of artificial intelligence and machine learning. In today’s data-driven world, mastering data labeling is not just an advantage, it’s a necessity. Data Labeling in Machine Learning with Python empowers you to unearth value from raw data, create intelligent systems, and influence the course of technological evolution.

This book covers the following exciting features:

  • Excel in exploratory data analysis (EDA) for tabular, text, audio, video, and image data
  • Understand how to use Python libraries to apply rules to label raw data
  • Discover data augmentation techniques for adding classification labels
  • Leverage K-means clustering to classify unsupervised data
  • Explore how hybrid supervised learning is applied to add labels for classification
  • Master text data classification with generative AI
  • Detect objects and classify images with OpenCV and YOLO
  • Uncover a range of techniques and resources for data annotation

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

storage:
backend: MINIO
minio:
bucket: pachyderm

Following is what you need for this book: This book starts with the introduction of exploratory data analysis using Python libraries and then covers the data labeling for tabular data, text data, image data, audio data using heuristics, semi-supervised learning, unsupervised learning and data augmentation. Finally, this book also delves into best practices and tools in the industry for data labeling.

With the following software and hardware list you can run all code files present in the book (Chapter 1-7).

Software and Hardware List

Chapter Software required OS required
1-7 AWS CLI (aws) Any OS
1-7 Red Hat OpenShift Client (oc) Any OS

Related products

Get to Know the Author

Vijaya Kumar Suda is a seasoned data and AI professional boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions.