A PyTorch Library for Efficient Neural Network Training

Train Faster, Reduce Cost, Get Better Models

[Website] - [Getting Started] - [Docs] - [Methods] - [We're Hiring!]

👋 Welcome

Composer is a PyTorch library that enables you to train neural networks faster, at lower cost, and to higher accuracy. We've implemented more than two dozen speedup methods that can be applied to your training loop in just a few lines of code, or used with our built-in Trainer. We continually integrate the latest state-of-the-art in efficient neural network training.

Composer features:

20+ methods for speeding up training networks for computer vision and natural language. Don't waste hours trying to reproduce research papers when Composer has done the work for you.
An easy-to-use trainer that has been written to be as performant as possible and integrates best practices for efficient, multi-GPU training.
Functional forms of all of our speedup methods that allow you to integrate them into your existing training loop.
Strong, reproducible baselines to get you started as quickly as possible.

Benefits

With no additional tuning, you can apply our methods to:

Train ResNet-50 on ImageNet to the standard 76.6% top-one accuracy for $15 in 27 minutes (with vanilla PyTorch: $116 in 3.5 hours) on AWS.
Train GPT-2 125M to the standard perplexity of 24.11 for $145 in 4.5 hours (with vanilla PyTorch: $255 in 7.8 hours) on AWS.
Train DeepLab-v3 on ADE20k to the standard mean IOU of 45.7 for $36 in 1.1 hours (with vanilla PyTorch: $110 in 3.5 hours) on AWS.

🚀 Quickstart

💾 Installation

Composer is available with Pip:

pip install mosaicml

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml

🚌 Usage

You can use Composer's speedup methods in two ways:

Through a standalone Functional API (similar to torch.nn.functional) that allows you to integrate them into your existing training code.
Using Composer's built-in Trainer, which is designed to be performant and automatically takes care of the details of using speedup methods.

Example: Functional API

Integrate our speedup methods into your training loop with just a few lines of code, and see the results. Here we easily apply BlurPool and SqueezeExcite:

import composer.functional as cf
from torchvision import models

my_model = models.resnet18()

# add blurpool and squeeze excite layers
cf.apply_blurpool(my_model)
cf.apply_squeeze_excite(my_model)

# your own training code starts here

For more examples, see the Composer Functional API Colab notebook and Functional API guide.

Example: Trainer

For the best experience and the most efficient possible training, we recommend using Composer's built-in trainer, which automatically takes care of the details of using speedup methods and provides useful abstractions that facilitate rapid experimentation.

import torch

# adaptive_avg_pool2d_backward_cuda in mnist_classifier is not deterministic
torch.use_deterministic_algorithms(False)

-->

from torch.utils.data import DataLoader
from torchvision import datasets, transforms

from composer import Trainer
from composer.algorithms import ChannelsLast, CutMix, LabelSmoothing
from composer.models import mnist_model

transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST("data", download=True, train=True, transform=transform)
eval_dataset = datasets.MNIST("data", download=True, train=False, transform=transform)
train_dataloader = DataLoader(train_dataset, batch_size=128)
eval_dataloader = DataLoader(eval_dataset, batch_size=128)

trainer = Trainer(
    model=mnist_model(),
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    max_duration="1ep",
    algorithms=[
        ChannelsLast(),
        CutMix(alpha=1.0),
        LabelSmoothing(smoothing=0.1),
    ]
)
trainer.fit()

Composer's built-in trainer makes it easy to add multiple speedup methods in a single line of code! Trying out new methods or combinations of methods is as easy as changing a single list.

Here are some examples of methods available in Composer (see here for the full list):

Name	Attribution	tl;dr	Example Benchmark	Speed Up*
Alibi	Press et al, 2021	Replace attention with AliBi.	GPT-2	1.5x
BlurPool	Zhang, 2019	Applies an anti-aliasing filter before every downsampling operation.	ResNet-101	1.2x
ChannelsLast	PyTorch	Uses channels last memory format (NHWC).	ResNet-101	1.5x
CutOut	DeVries et al, 2017	Randomly erases rectangular blocks from the image.	ResNet-101	1.2x
LabelSmoothing	Szegedy et al, 2015	Smooths the labels with a uniform prior	ResNet-101	1.5x
MixUp	Zhang et al, 2017	Blends pairs of examples and labels.	ResNet-101	1.5x
RandAugment	Cubuk et al, 2020	Applies a series of random augmentations to each image.	ResNet-101	1.3x
SAM	Foret et al, 2021	An optimization strategy that seeks flatter minima.	ResNet-101	1.4x
SeqLengthWarmup	Li et al, 2021	Progressively increase sequence length.	GPT-2	1.2x
Stochastic Depth	Huang et al, 2016	Replaces a specified layer with a stochastic version that randomly drops the layer or samples during training	ResNet-101	1.1x

* = time-to-train to the same quality as the baseline.

🛠 Building Speedup Recipes

Given two methods that speed up training by 1.5x each, do they combine to provide a 2.25x (1.5x * 1.5x) speedup? Not necessarily. They may optimize the same part of the training process and lead to diminishing returns, or they may even interact in ways that prove detrimental. Determining which methods to compose together isn't as simple as assembling a set of methods that perform best individually.

We have come up with compositions of methods that work especially well together through rigorous exploration of the design space of recipes and research on the science behind composition.

As an example, here are two performant recipes, one for ResNet-101 on ImageNet, and the other for GPT-2 on OpenWebText, on 8xA100s:

ResNet-101

Name	Functional	tl;dr	Benchmark	Speed Up
Blur Pool	`cf.apply_blurpool`	Applies an anti-aliasing filter before every downsampling operation.	ResNet-101	1.2x
Channels Last	`cf.apply_` `channels_last`	Uses channels last memory format (NHWC).	ResNet-101	1.5x
Label Smoothing	`cf.smooth_labels`	Smooths the labels with a uniform prior.	ResNet-101	1.5x
MixUp	`CF.mixup_batch`	Blends pairs of examples and labels.	ResNet-101	1.5x
Progressive Resizing	`cf.resize_batch`	Increases the input image size during training.	ResNet-101	1.3x
SAM	`N/A`	SAM optimizer measures sharpness of optimization space.	ResNet-101	1.5x
Composition	`N/A`	Cheapest: $49 @ 78.1% Acc	ResNet-101	3.5x

GPT-2

Name	Functional	tl;dr	Benchmark	Speed Up
Alibi	`cf.apply_alibi`	Replace attention with AliBi.	GPT-2	1.6x
Seq Length Warmup	`cf.set_batch_` `sequence_length`	Progressively increase sequence length.	GPT-2	1.5x
Composition	`N/A`	Cheapest: $145 @ 24.11 PPL	GPT-2	1.7x

⚙️ What benchmarks does Composer support?

We'll use the word benchmark to denote a specific model trained on a specific dataset, with model quality assessed using a specific metric.

Composer features computer vision and natural language processing benchmarks including (but not limited to):

Model	Dataset	Loss	Task	Evaluation Metrics
Computer Vision
ResNet Family	CIFAR-10	Cross Entropy	Image Classification	Classification Accuracy
ResNet Family	ImageNet	Cross Entropy	Image Classification	Classification Accuracy
EfficientNet Family	ImageNet	Cross Entropy	Image Classification	Classification Accuracy
UNet	BraTS	Dice Loss	Image Segmentation	Dice Coefficient
DeepLab v3	ADE20K	Cross Entropy	Image Segmentation	mIoU
Natural Language Processing
BERT Family	{Wikipedia & BooksCorpus, C4}	Cross Entropy	Masked Language Modeling	GLUE
GPT Family	{OpenWebText, C4}	Cross Entropy	Language Modeling	Perplexity

🤔 Why should I use Composer?

Speed

The compute required to train a state-of-the-art machine learning model is doubling every 6 months, putting such models further and further out of reach for most researchers and practitioners with each passing day.

Composer addresses this challenge by focusing on training efficiency: it contains cutting-edge speedup methods that modify the training algorithm to reduce the time and cost necessary to train deep learning models. When you use Composer, you can rest assured that you are training efficiently. We have combed the literature, done the science, and built industrial-grade implementations to ensure this is the case.

Flexibility

Even after these speedup methods are implemented, assembling them together into recipes is nontrivial. We designed Composer with the right abstractions for composing (and creating new) speedup methods.

Specifically, Composer uses two-way callbacks (Howard et al, 2020) to modify the entire training state at particular events in the training loop to effect speedups. We handle collisions between methods, proper method ordering, and more.

Through this, methods can modify:

data inputs for batches (data augmentations, sequence length warmup, skipping examples, etc.)
neural network architecture (pruning, model surgery, etc.)
loss function (label smoothing, MixUp, CutMix, etc.)
optimizer (Sharpness Aware Minimization)
training dynamics (layer freezing, selective backprop, etc.)

You can easily add your own methods or callbacks to try out your ideas or modify any part of the training loop.

Support

Composer is an active and ongoing project. We will respond quickly to issues filed in this repository.

🧐 Why shouldn’t I use Composer?

Composer is mostly optimized for computer vision and natural language processing. If you work on, e.g., reinforcement learning, you might encounter rough edges when using Composer.
Composer currently only supports NVIDIA GPUs, although we're working on adding alternatives.
Since Composer is still in alpha, our API may not be stable. We recommend pegging your work to a Composer version.

📚 Learn More

Here are some resources actively maintained by the Composer community to help you get started:

Resource	Details
Getting started with our Trainer	A Colab Notebook showing how to use our Trainer
Getting started with our Functional API	A Colab Notebook showing how to use our Functional API
Building Speedup Methods	A Colab Notebook showing how to build new training modifications on top of Composer
Training BERTs with Composer and 🤗	A Colab Notebook showing how to train BERT models with Composer and 🤗!

If you have any questions, please feel free to reach out to us on Twitter, email, or our Community Slack!

💫 Contributors

Composer is part of the broader Machine Learning community, and we welcome any contributions, pull requests, or issues!

To start contributing, see our Contributing page.

P.S.: We're hiring!

✍️ Citation

@misc{mosaicml2022composer,
    author = {The Mosaic ML Team},
    title = {composer},
    year = {2021},
    howpublished = {\url{https://github.com/mosaicml/composer/}},
}

zankner/composer