In this notebook we fine-tune the whisper task to the DR_VCTK speaker recognition dataset.
Here's the score with the simple linear model:
- (clean data) accuracy: 0.998
- (noisy data) accuracy: 0.984
Using the embeddings and Linear SVM gives about ~0.88 accuracy on the clean data.
Tiny GPT-like model trained on the Shakespeare dataset on a puny RTX3080 GPU.
- validation loss: 0.547 with
tiktoken
tokenizer.
Tiny GPT-like model trained on the Wikipedia dataset on a puny RTX3080 GPU.
- beam search decoding (soft + greedy)
- information retrieval extension with infoNCE
- text retrieval image model