LittleBird

Implementation of LittleBird: Efficient Faster & Longer Transformer for Question Answering

LittleBird is a sparse attention model proposed by Kakao Enterprise Corp. that improves on BigBird by reducing the memory footprint and improving the speed while maintaining accuracy. The model is a combination of BigBird's sliding window attention and LUNA pack and unpack attention with a custom bi-directional positional representation method based on ALiBi. As of 2022.03.08 the model sits at first place on the KorQuad 2.0 test set.

Installation

During development, I used Python 3.8 and Pytorch 1.12. You can install from source as follows:

pip install .

Usage

import torch

from littlebird import LittleBirdModel

seq_len = 6144
pack_len = 64
vocab_size = 30_000
embed_dim = 768
block_size = 64

encoded = {
    "input_ids": [[1,2,3,4,5,6,7,8,9,10] + [0]*(seq_len-10)],
    "attention_mask":[[1,1,1,1,1,1,1,1,1,1] + [0]*(seq_len-10)]
}

m = LittleBirdModel(seq_len, pack_len, vocab_size, embed_dim, block_size=block_size)

m(
    torch.as_tensor(encoded["input_ids"]),
    torch.as_tensor(encoded["attention_mask"]).bool(),
)

Contributing

There may be minor issues with the implementation. I greatly appreciate all contributions. If you have any issues, bug reports, or feature requests, you may open an issue on github. Alternatively, feel free to fork this repository and submit a pull request with your changes.

References

• LittleBird: Efficient Faster & Longer Transformer for QuestionAnswering
• Big Bird: Transformers for Longer Sequences
• Luna: Linear Unified Nested Attention
• ALiBi

Author

Teryn Jones

jwnz/littlebird