Experiments with "Bidirectional Encoder Representation from Transformers" (BERT)

Exploration of the inner working by fine-tuning a BERT encoder on the 10KGNAD dataset.

The code base is largely built on the great Pytorch-transformers(https://huggingface.co/pytorch-transformers/index.html) lib.

Dataset

The training data is based on the "Ten Thousand German News Articles Dataset for Topic Classification" https://tblock.github.io/10kGNAD/"

To get the data clone the git repository

git clone https://github.com/tblock/10kGNAD.git

Class distribution of dataset

Model training

Training can be started on the CLI using:

$ python pipeline.py --train .\data --test .\data --model-dir .\trained --output-data-dir .\output --per_gpu_train_batch_size 3

(All default hyper parameter can be overwritten via CLI.)

Hyper params:

Param	Value
attention_probs_dropout_prob	0.1
hidden_act	"gelu"
hidden_dropout_prob	0.1
hidden_size	768
initializer_range	0.02
intermediate_size	3072
layer_norm_eps	1e-12
max_position_embeddings	512
num_attention_heads	12
num_hidden_layers	12
num_labels	9
output_attentions	false
output_hidden_states	true
torchscript	false
type_vocab_size	2
vocab_size	30000

Results

Metric	Value
acc	0.8929961089494164
acc_and_f1	0.8929961089494164
f1_macro	0.8902614321693273
f1_micro	0.8929961089494164
loss	0.6125384846398997

Embeddings

Test sentences:

"Die Prinzessin wohnt im prunkvollen Schloss", 
"Das Schloss wurde im Mittelalter erbaut",
"Die Türe benötigt dringend ein neue Schloss",
"Das Schloss der Kiste ist schon lange defekt"]