xFormers is a modular and field agnostic library to flexibly generate transformer architectures from interoperable and optimized building blocks. These blocks are not limited to xFormers and can also be cherry picked as the user see fit.
The full documentation contains instructions for getting started, deep dives and tutorials about the various APIs. If in doubt, please check out the HOWTO. Only some general considerations are laid out in the README.
For recent changes, you can have a look at the changelog
To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through python-virtualenv
or conda
for instance.
PyTorch must be installed. Using conda for example:
conda create --name xformers python=3.10
conda activate xformers
conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1
*Please note that Pytorch 1.12 or newer is required.
There are two ways you can install xFormers locally:
Conda dev packages
There are regular builds of xformers as it is developed on the main
branch.
To use these, you must be on Linux and have a conda environment with Python 3.9 or 3.10, CUDA 11.3 or 11.6, and PyTorch 1.12.1.
You can install the latest with
conda install xformers -c xformers/label/dev
Build from source (dev mode)
These commands will fetch the latest version of the code and then install xFormers from source. If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.
git clone git@github.com:facebookresearch/xformers.git
git submodule update --init --recursive
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .
# or, for OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ pip install -e .
Sparse attention kernels
Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run pip install -e .
and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via python3 setup.py clean && python3 setup.py develop
, so similarly wipe the build
folder and redo a pip install -e.
Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:
- NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with
module unload cuda module load cuda/xx.x
, possibly alsonvcc
- the version of GCC that you're using matches the current NVCC capabilities
- the
TORCH_CUDA_ARCH_LIST
env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) isexport TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"
Triton
Some parts of xFormers use Triton, and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running pip install triton
. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance python3 xformers/benchmarks/benchmark_triton_softmax.py
Triton will cache the compiled kernels to /tmp/triton
by default. If this becomes an issue, this path can be specified through the TRITON_CACHE_DIR
environment variable.
AOTAutograd/NVFuser
Some parts of xFormers use AOT Autograd from the FuncTorch library, and will only expose themselves if FuncTorch is installed, and a compatible GPU is present. If functorch was not installed as part of the testing procedure, you can install it directly through pip.
pip install functorch
Once installed, set the flag _is_functorch_available = True
in xformers/__init__.py
. You can optionally test that the installation is successful by running one of the functorch-related benchmarks python3 xformers/benchmarks/benchmark_nvfuser.py
If you are importing the xFormers library in a script, you can modify the flag as such:
import xformers
xformers._is_functorch_available = True
This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot.
If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to
be able to go through : pip install -r requirements-benchmark.txt
.
Once this is done, you can run this particular benchmark as follows:
python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16
Let's start from a classical overview of the Transformer architecture (illustration from Lin et al,, "A Survey of Transformers")
You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.
Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.
├── components # Parts zoo, any of which can be used directly
│ ├── attention
│ │ └ ... # all the supported attentions
│ ├── feedforward #
│ │ └ ... # all the supported feedforwards
│ ├── positional_embedding #
│ │ └ ... # all the supported positional embeddings
│ ├── activations.py #
│ └── multi_head_dispatch.py # (optional) multihead wrap
│
├── factory # Build model programatically
│ ├── block_factory.py # (optional) helper to programatically generate layers
│ └── model_factory.py # (optional) helper to programatically generate models
│
├── benchmarks
│ └ ... # A lot of benchmarks that you can use to test some parts
└── triton
└ ... # (optional) all the triton parts, requires triton + CUDA gpu
Attention mechanisms
-
- whenever a sparse enough mask is passed
-
- courtesy of Triton
-
Local. Notably used in (and many others)
-
- See BigBird, Longformers,..
-
- See BigBird, Longformers,..
-
... add a new one see Contribution.md
Initializations
This is completely optional, and will only occur when generating full models through xFormers, not when picking parts individually.
There are basically two initialization mechanisms exposed, but the user is free to initialize weights as he/she sees fit after the fact.
- Parts can expose a
init_weights()
method, which define sane defaults - xFormers supports specific init schemes which can take precedence over the init_weights()
If the second code path is being used (construct model through the model factory), we check that all the weights have been initialized, and possibly error out if it's not the case
(if you set xformers.factory.weight_init.__assert_if_not_initialized = True
)
Supported initialization schemes are:
One way to specify the init scheme is to set the config.weight_init
field to the matching enum value.
This could easily be extended, feel free to submit a PR !
- Many attention mechanisms, interchangeables
- Optimized building blocks, beyond PyTorch primitives
- sparse attention
- block-sparse attention
- fused softmax
- fused linear layer
- fused layer norm
- fused dropout(activation(x+bias))
- Benchmarking and testing tools
- micro benchnmarks
- transformer block benchmark
- LRA, with SLURM suppot
- Programatic and sweep friendly layer and model construction
- Compatible with hierarchical Transformers, like Swin or Metaformer
- Hackable
- Not using monolithic CUDA kernels, composable building blocks
- Using Triton for some optimized parts, explicit, pythonic and user-accessible
- Native support for SquaredReLU (on top of ReLU, LeakyReLU, GeLU, ..), extensible activations
We've tried to collect a relatively exhaustive list of explanations in the HOWTO
xFormers has a BSD-style license, as found in the LICENSE file.
If you use xFormers in your publication, please cite it by using the following BibTeX entry.
@Misc{xFormers2021,
author = {Benjamin Lefaudeux and Francisco Massa and Diana Liskovich and Wenhan Xiong and Vittorio Caggiano and Sean Naren and Min Xu and Jieru Hu and Marta Tintore and Susan Zhang},
title = {xFormers: A modular and hackable Transformer modelling library},
howpublished = {\url{https://github.com/facebookresearch/xformers}},
year = {2021}
}
The following repositories are used in xFormers, either in close to original form or as an inspiration: