/learn-huggingface

Repo designed to help learn the Hugging Face ecosystem (transformers, datasets, accelerate + more).

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Learn Hugging Face 🤗 (work in progress)

I'd like to learn the Hugging Face ecosystem better (transformers, datasets, accelerate + more).

So this repo is to help me learn it and simulatenously teach others.

Each example will include an end-to-end approach of starting with a dataset (custom or existing), building and evaluating a model and creating a demo to share.

Teaching style:

A machine learning cooking show! 👨‍🍳

Mottos:

  • If in doubt, run the code. -- Machine learning is very experimental. So it's good to get in the habit of continually trying things (even if you think they won't work).
  • Visualize, visualize, visualize! - If you're not sure of some dataset or some operation or some predictions, visualize it/them.
  • Experiment, experiment, experiment! - Again, machine learning is very experimental. So keep trying different things!
  • Data, model, demo! - Create/get a dataset, build/train/evaluate a model, create a demo to share.

Project style:

Data, model, demo!

  • Create a new/reuse an existing dataset.
  • Train/evaluate a model.
  • Build a demo to share.

This will be our (rough) workflow:

The diagram shows the Hugging Face model development workflow, which includes the following steps: start with an idea or problem, get data ready (turn into tensors/create data splits), pick a pretrained model (to suit your problem), train/fine-tune the model on your custom data, evaluate the model, improve through experimentation, save and upload the fine-tuned model to the Hugging Face Hub, and turn your model into a shareable demo. Tools used in this workflow are Datasets/Tokenizers, Transformers/PEFT/Accelerate/timm, Hub/Spaces/Gradio, and Evaluate.

A general Hugging Face workflow from idea to shared model and demo using tools from the Hugging Face ecosystem. These kind of workflows are not set in stone and are more of guide than specific directions. See information on each of the tools in the Hugging Face documentation.

Contents

All code and text will be free/open-source, video step-by-step walkthroughs are available as a paid upgrade.

Project Description Dataset Model Demo Video Course
Text classification Build project "Food Not Food", a text classification model to classify image captions into "food" if they're about food or "not_food" if they're not about food. This is the ideal place to get started if you've never used the Hugging Face ecosystem. Dataset Model Demo Video Course
More to come soon! Let me know if you'd like to see anything specific by leaving an issue.

Who is it for?

Ideal for:

  • Beginners who love things explained in detail.
  • Someone who wants to create more of their own end-to-end machine learning projects.

Not ideal for:

  • People with 2-3+ years of machine learning projects & experience^.

^Note: This being said, you may actually find some things helpful along the way. Best to explore and see!

Prerequisites

What is Hugging Face?

Hugging Face is a platform that offers access to many different kinds of open-source machine learning models and datasets.

They're also the creators of the popular transformers library (and many more helpful libraries) which is a Python-based library for working with pre-trained models as well as custom models.

If you're getting into the world of AI and machine learning, you're going to come across Hugging Face.

'Four browser screenshots displaying different sections of Hugging Face's ecosystem. Top-left: 'Hugging Face Transformers' page explaining access, training, and fine-tuning of state-of-the-art models. Top-right: 'Hugging Face Datasets' page about storing, processing, and accessing datasets. Bottom-left: 'Hugging Face Tokenizers' page on preprocessing and preparing text data for ML models. Bottom-right: 'Hugging Face Hub' page about storing and sharing models, datasets, and demos.'

A handful of pieces from the Hugging Face ecosystem. There are many more available in Hugging Face documentation.

Why Hugging Face?

Many of the biggest companies in the world use Hugging Face for their open-source machine learning projects including Apple, Google, Facebook (Meta), Microsoft, OpenAI, ByteDance and more.

Not only does Hugging Face make it so you can use state-of-the-art machine learning models such as Stable Diffusion (for image generation) and Whipser (for audio transcription) easily, it also makes it so you can share your own models, datasets and resources.

Aside from your own website, consider Hugging Face the homepage of your AI/machine learning profile.

TODO

  • Prerequisites
  • Ecosystem overview (transformers, datasets, accelerate, tokenizers, Spaces, demos, models, hub etc.)
  • Text classification
  • Object detection
  • Named entity recognition
  • LLM fine-tuning
  • VLM fine-tuning
  • RAG workflow
  • Zero-shot image classification/multi-modal workflows (CLIP)

Setup

See setup.

Resources

FAQ

Is this an official Hugging Face website?

No, it's a personal project by myself (Daniel Bourke) to learn and help others learn the Hugging Face ecosystem.