Fall 2021 / Auburn University
This repository is my implementation of assignments and projects of COMP 7970: Natural Language Processing.
✅ Q1: Implementation of LDA
✅ Q2: Visualization Of Topics
✅ Q3: Implement KL-divergence
In this assignment, I've implemented a custom Word2Vec model following the paper Efficient Estimation of Word Representation in Vector Space. I've implemented the Skip-gram model using PyTorch where word embedding size is 50. The model has been trained on Amazon Fine Food Review Dataset upto 10 epochs and the checkpoint file can be found here
✅ Q1: Implementation of Word2Vec
✅ Q2: Find Similar Words by loading the saved model
- Coffee : craving
- Tuna: sudorific
✅ Q3: Word Analogies with GloVe
In this part, I've used GloVe 300d vector and tested different word analogies. For example:
- Spain is to Spanish as Germany is togerman
- Japan is to Tokyo as France is to paris
In this assignment, I've implemented two differenttext summarizer model using 1) Google PEGASUS and 2) Facebook BART. I used the pretrained model from huggingface and later fine-tuned on CNN/DailyMail dataset.
✅ Q1: Implementation of Text Summarizer
Model | Script | URL for Fine-tuned Model on huggingface |
---|---|---|
Pegasus | Pegasus python Script | https://huggingface.co/Mousumi/finetuned_pegasus |
BART | BART python Script | https://huggingface.co/Mousumi/finetuned_bart |
✅ Q2: Evaluation of Text Summarizer
To Run the test.py
python test.py --model=pegasus
python test.py --model=bart
Model | ROUGE-1 | ROUGE-2 | ROUGE-3 | ROUGE-L | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
Pegasus | 28.36 | 50.72 | 34.1 | 11.74 | 20.55 | 14.0 | 6.63 | 11.41 | 7.79 | 18.37 | 33.83 | 22.25 |
BART | 56.51 | 15.4 | 23.64 | 26.55 | 6.77 | 10.52 | 14.93 | 3.58 | 5.61 | 46.1 | 12.56 | 19.27 |