Run your *raw* PyTorch training script on any kind of device
Easy to integrate
Here is an example:
import torch
import torch.nn.functional as F
from datasets import load_dataset
+ from accelerate import Accelerator
+ accelerator = Accelerator()
- device = 'cpu'
+ device = accelerator.device
model = torch.nn.Transformer().to(device)
optimizer = torch.optim.Adam(model.parameters())
dataset = load_dataset('my_dataset')
data = torch.utils.data.DataLoader(dataset, shuffle=True)
+ model, optimizer, data = accelerator.prepare(model, optimizer, data)
model.train()
for epoch in range(10):
for source, targets in data:
source = source.to(device)
targets = targets.to(device)
optimizer.zero_grad()
output = model(source)
loss = F.cross_entropy(output, targets)
- loss.backward()
+ accelerator.backward(loss)
optimizer.step()
As you can see in this example, by adding 5-lines to any standard PyTorch training script you can now run on any kind of single or distributed node setting (single CPU, single GPU, multi-GPUs and TPUs) as well as with or without mixed precision (fp16).
In particular, the same code can then be run without modification on your local machine for debugging or your training environment.
import torch
import torch.nn.functional as F
from datasets import load_dataset
+ from accelerate import Accelerator
- device = 'cpu'
+ accelerator = Accelerator()
- model = torch.nn.Transformer().to(device)
+ model = torch.nn.Transformer()
optimizer = torch.optim.Adam(model.parameters())
dataset = load_dataset('my_dataset')
data = torch.utils.data.DataLoader(dataset, shuffle=True)
+ model, optimizer, data = accelerator.prepare(model, optimizer, data)
model.train()
for epoch in range(10):
for source, targets in data:
- source = source.to(device)
- targets = targets.to(device)
optimizer.zero_grad()
output = model(source)
loss = F.cross_entropy(output, targets)
- loss.backward()
+ accelerator.backward(loss)
optimizer.step()
Want to learn more? Check out the documentation or have look at our examples.
Launching script
torch.distributed.launch
or to write a specific launcher for TPU training!
On your machine(s) just run:
accelerate config
and answer the questions asked. This will generate a config file that will be used automatically to properly set the default options when doing
accelerate launch my_script.py --args_to_my_script
For instance, here is how you would run the GLUE example on the MRPC task (from the root of the repo):
accelerate launch examples/nlp_example.py
This CLI tool is optional, and you can still use python my_script.py
or python -m torch.distributed.launch my_script.py
at your convenance.
Launching multi-CPU run using MPI
mpirun -np 2 python examples/nlp_example.py
Launching training using DeepSpeed
accelerate config
. However, if you desire to tweak your DeepSpeed related args from your python script, we provide you the DeepSpeedPlugin
.
from accelerator import Accelerator, DeepSpeedPlugin
# deepspeed needs to know your gradient accumulation steps before hand, so don't forget to pass it
# Remember you still need to do gradient accumulation by yourself, just like you would have done without deepspeed
deepspeed_plugin = DeepSpeedPlugin(zero_stage=2, gradient_accumulation_steps=2)
accelerator = Accelerator(fp16=True, deepspeed_plugin=deepspeed_plugin)
# How to save your 🤗 Transformer?
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(save_dir, save_function=accelerator.save, state_dict=accelerator.get_state_dict(model))
Note: DeepSpeed support is experimental for now. In case you get into some problem, please open an issue.
Launching your training from a notebook
notebook_launcher
function you can use in a notebook to launch a distributed training. This is especially useful for Colab or Kaggle notebooks with a TPU backend. Just define your training loop in a training_function
then in your last cell, add:
from accelerate import notebook_launcher
notebook_launcher(training_function)
An example can be found in this notebook.
🤗 Accelerate?
Why should I use You should use Accelerator
object.
🤗 Accelerate?
Why shouldn't I use You shouldn't use 🤗 Accelerate if you don't want to write a training loop yourself. There are plenty of high-level libraries above PyTorch that will offer you that, 🤗 Accelerate is not one of them.
🤗 Accelerate
Frameworks using If you like the simplicity of
- pytorch-accelerated is a lightweight training library, with a streamlined feature set centred around a general-purpose Trainer, that places a huge emphasis on simplicity and transparency; enabling users to understand exactly what is going on under the hood, but without having to write and maintain the boilerplate themselves!
- Kornia is a differentiable library that allows classical computer vision to be integrated into deep learning models. Kornia provides a Trainer with the specific purpose to train and fine-tune the supported deep learning algorithms within the library.
Installation
This repository is tested on Python 3.6+ and PyTorch 1.4.0+
You should install
First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install PyTorch: refer to the official installation page regarding the specific install command for your platform. Then
pip install accelerate
Supported integrations
- CPU only
- multi-CPU on one node (machine)
- multi-CPU on several nodes (machines)
- single GPU
- multi-GPU on one node (machine)
- multi-GPU on several nodes (machines)
- TPU
- FP16 with native AMP (apex on the roadmap)
- DeepSpeed support (experimental)