This repository has been archived (read-only) on Nov 21, 2022. Thanks to everyone who contributed to lightning-transformers
, we feel it's time to move on.
🤗 Transformers can already be easily trained using the Lightning ⚡ Trainer. Here's a recent example from the community: https://sachinruk.github.io/blog/deep-learning/2022/11/07/t5-for-grammar-correction.html. Note that there are no limitations or workarounds, things just work out of the box.
The lightning-transformers
repo explored the possibility to provide task-specific modules and pre-baked defaults, at the cost of introducing extra abstractions. In the spirit of keeping ourselves focused, these abstractions are not something we wish to continue supporting.
If you liked lightning-transformers
and want to continue developing it in the future, feel free to fork the repo and choose another name for the project.
pip install lightning-transformers
From Source
git clone https://github.com/PyTorchLightning/lightning-transformers.git
cd lightning-transformers
pip install .
Lightning Transformers provides LightningModules
, LightningDataModules
and Strategies
to use 🤗 Transformers with the PyTorch Lightning Trainer.
Train bert-base-cased on the CARER emotion dataset using the Text Classification task.
import pytorch_lightning as pl
from transformers import AutoTokenizer
from lightning_transformers.task.nlp.text_classification import (
TextClassificationDataModule,
TextClassificationTransformer,
)
tokenizer = AutoTokenizer.from_pretrained(
pretrained_model_name_or_path="bert-base-cased"
)
dm = TextClassificationDataModule(
batch_size=1,
dataset_name="emotion",
max_length=512,
tokenizer=tokenizer,
)
model = TextClassificationTransformer(
pretrained_model_name_or_path="bert-base-cased", num_labels=dm.num_classes
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)
trainer.fit(model, dm)
import pytorch_lightning as pl
from transformers import AutoTokenizer
from lightning_transformers.task.nlp.translation import (
TranslationTransformer,
WMT16TranslationDataModule,
)
tokenizer = AutoTokenizer.from_pretrained(
pretrained_model_name_or_path="google/mt5-base"
)
model = TranslationTransformer(
pretrained_model_name_or_path="google/mt5-base",
n_gram=4,
smooth=False,
val_target_max_length=142,
num_beams=None,
compute_generate_metrics=True,
)
dm = WMT16TranslationDataModule(
# WMT translation datasets: ['cs-en', 'de-en', 'fi-en', 'ro-en', 'ru-en', 'tr-en']
dataset_config_name="ro-en",
source_language="en",
target_language="ro",
max_source_length=128,
max_target_length=128,
padding="max_length",
tokenizer=tokenizer,
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)
trainer.fit(model, dm)
Lightning Transformers supports a bunch of 🤗 tasks and datasets. See the documentation.
It's really easy to enable large model support for the pre-built LightningModule 🤗 tasks.
Below is an example to enable automatic model partitioning (across CPU/GPU and even leveraging disk space) to run text generation using a 6B parameter model.
import torch
from accelerate import init_empty_weights
from transformers import AutoTokenizer
from lightning_transformers.task.nlp.language_modeling import (
LanguageModelingTransformer,
)
with init_empty_weights():
model = LanguageModelingTransformer(
pretrained_model_name_or_path="EleutherAI/gpt-j-6B",
tokenizer=AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B"),
low_cpu_mem_usage=True,
device_map="auto", # automatically partitions the model based on the available hardware.
)
output = model.generate("Hello, my name is", device=torch.device("cuda"))
print(model.tokenizer.decode(output[0].tolist()))
For more information see Big Transformers Model Inference.
Below is an example of how you can train a 6B parameter transformer model using Lightning Transformers and DeepSpeed.
import pytorch_lightning as pl
from transformers import AutoTokenizer
from lightning_transformers.task.nlp.language_modeling import (
LanguageModelingDataModule,
LanguageModelingTransformer,
)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="gpt2")
model = LanguageModelingTransformer(
pretrained_model_name_or_path="EleutherAI/gpt-j-6B",
tokenizer=AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B"),
deepspeed_sharding=True, # defer initialization of the model to shard/load pre-train weights
)
dm = LanguageModelingDataModule(
batch_size=1,
dataset_name="wikitext",
dataset_config_name="wikitext-2-raw-v1",
tokenizer=tokenizer,
)
trainer = pl.Trainer(
accelerator="gpu",
devices="auto",
strategy="deepspeed_stage_3",
precision=16,
max_epochs=1,
)
trainer.fit(model, dm)
For more information see DeepSpeed Training with Big Transformers Models or the Model Parallelism documentation.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
For help or questions, join our huge community on Slack!