/transformers-ru

A list of pretrained Transformer models for the Russian language.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Transformers-ru

A list of pretrained Transformer models for the Russian language (including multilingual models).

Code for the model using and visualisation is from the following repos:

Models

There are models form:

Model description # params Config Vocabulary Model BPE codes
BERT-Base, Multilingual Cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters 170M [huggingface] 1K [huggingface] 973K [huggingface] 682M
BERT-Base, Multilingual Uncased: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters 160M [huggingface] 1K [huggingface] 852K [huggingface] 642M
RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters 170M [deeppavlov] 636M
SlavicBERT, Slavic (bg, cs, pl, ru), cased, 12-layer, 768-hidden, 12-heads, 180M parameters 170M [deeppavlov] 636M
XLM (MLM) 15 languages 237M [huggingface] 1K [huggingface] 2,9M
[facebook] 1,5M
[huggingface] 1,3G
[facebook] 1,3G
[huggingface] 1,4M
[facebook] 1,4M
XLM (MLM+TLM) 15 languages 237M [huggingface] 1K [huggingface] 2,9M
[facebook] 1,5M
[huggingface] 661M
[facebook] 665M
[huggingface] 1,4M
[facebook] 1,4M
XLM (MLM) 17 languages [facebook] 2,9M [facebook] 1,1G [facebook] 2,9M
XLM (MLM) 100 languages [facebook] 3,0M [facebook] 1,1G [facebook] 2,9M
Denis Antyukhov BERT-Base, Russian, Uncased, 12-layer, 768-hidden, 12-heads 176M [bert_resourses] 1,9G
Facebook-FAIR's WMT'19 en-ru [fairseq] 12G
Facebook-FAIR's WMT'19 ru-en [fairseq] 12G
Facebook-FAIR's WMT'19 ru [fairseq] 2,1G
Russian RuBERTa [Google Drive] 247M

Converting TensorFlow models to PyTorch

Downloading and converting the DeepPavlov model:

$ wget 'http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz'
$ tar -xzf rubert_cased_L-12_H-768_A-12_v1.tar.gz
$ python3 convert_tf_checkpoint_to_pytorch.py \
    --tf_checkpoint_path rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt \
    --bert_config_file rubert_cased_L-12_H-768_A-12_v1/bert_config.json \
    --pytorch_dump_path rubert_cased_L-12_H-768_A-12_v1/bert_model.bin

Models comparison

There are scripts to train and evaluate models on the Sber SQuAD dataset for the russian language [download dataset].

Comparision of BERT models trained on the Sber SQuAD dataset:

Model EM (dev) F-1 (dev)
BERT-Base, Multilingual Cased 64.85 83.68
BERT-Base, Multilingual Uncased 64.73 83.25
RuBERT 66.38 84.58
SlavicBERT 65.23 83.68
RuBERTa-base 59.45 78.60

Visualization

The attention-head view visualization from BertViz: Attention-head view

[Notebook]

The model view visualization from BertViz: Model view

[Notebook]

The neuron view visualization from BertViz: Neuron view

[Notebook]

Generative models

GPT-2 models

Mikhail Grankin's model

Code: https://github.com/mgrankin/ru_transformers

Download models:

pip install awscli
aws s3 sync --no-sign-request s3://models.dobro.ai/gpt2/ru/unfreeze_all gpt2

Vladimir Larin's model

RNN Models

There are some RNN models for russian language.

ELMo

  • RNC and Wikipedia. December 2018 (tokens): [model]
  • RNC and Wikipedia. December 2018 (lemmas): [model]
  • Taiga 2048. December 2019 (lemmas): [model]

ULMFit