LemurPwned/nlp

Jupyter Notebook

Finetuning the library models for speech recognition

In this notebook we fine-tune the whisper task to the DR_VCTK speaker recognition dataset.

Here's the score with the simple linear model:

(clean data) accuracy: 0.998
(noisy data) accuracy: 0.984

Using the embeddings and Linear SVM gives about ~0.88 accuracy on the clean data.

Shakespeare dataset GPT-copy

Tiny GPT-like model trained on the Shakespeare dataset on a puny RTX3080 GPU.

validation loss: 0.547 with tiktoken tokenizer.

Wikipedia dataset GPT-copy

Tiny GPT-like model trained on the Wikipedia dataset on a puny RTX3080 GPU.

TODOs

beam search decoding (soft + greedy)
information retrieval extension with infoNCE
text retrieval image model