/nlp

Primary LanguageJupyter Notebook

Finetuning the library models for speech recognition

In this notebook we fine-tune the whisper task to the DR_VCTK speaker recognition dataset.

Here's the score with the simple linear model:

  • (clean data) accuracy: 0.998
  • (noisy data) accuracy: 0.984

Using the embeddings and Linear SVM gives about ~0.88 accuracy on the clean data.


Shakespeare dataset GPT-copy

Tiny GPT-like model trained on the Shakespeare dataset on a puny RTX3080 GPU.

  • validation loss: 0.547 with tiktoken tokenizer.

Wikipedia dataset GPT-copy

Tiny GPT-like model trained on the Wikipedia dataset on a puny RTX3080 GPU.

TODOs

  • beam search decoding (soft + greedy)
  • information retrieval extension with infoNCE
  • text retrieval image model