/Uni-Core

an efficient distributed PyTorch framework

Primary LanguagePythonMIT LicenseMIT

Uni-Core, an efficient distributed PyTorch framework

Uni-Core is built for rapidly creating PyTorch models with high performance, especially for Transfromer-based models. It supports the following features:

  • Distributed training over multi-GPUs and multi-nodes
  • Mixed-precision training with fp16 and bf16
  • High-performance fused CUDA kernels
  • model checkpoint management
  • Friendly logging
  • Buffered (GPU-CPU overlapping) data loader
  • Gradient accumulation
  • Commonly used optimizers and LR schedulers
  • Easy to create new models

Installation

Build from source

You can use python setup.py install or pip install . to build Uni-Core from source. The CUDA version in the build environment should be the same as the one in PyTorch.

You can also use python setup.py install --disable-cuda-ext to disalbe the cuda extension operator when cuda is not available.

Use pre-compiled python wheels

We also pre-compiled wheels by GitHub Actions. You can download them from the Release. And you should check the pyhon version, PyTorch version and CUDA version. For example, for PyToch 1.12.1, python 3.7, and CUDA 11.3, you can install unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl.

Docker image

We also provide the docker image. you can pull it by docker pull dptechnology/unicore:0.0.1-pytorch1.11.0-cuda11.3. To use GPUs within docker, you need to install nvidia-docker-2 first.

Example

To build a model, you can refer to example/bert.

Related projects

Acknowledgement

The main framework is from facebookresearch/fairseq.

The fused kernels are from guolinke/fused_ops.

Dockerfile is from guolinke/pytorch-docker.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.