This repository is a fork of the official implementation of DeBERTa: Decoding-enhanced BERT with Disentangled Attention and DeBERTa V3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
This repository is a modification of DeBERTa, adding a hierarchical architecture which combines the advantages of both character and word tokenizers, with word-level self attention and unlimited vocabulary as described in From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding. This implementation combines its advantages with the DeBERTA architecture. Pass --token_format char_to_word
to the data preparation and training scripts.
--token_format char
--token_format subword
--token_format char_to_word