COMP 7970: Natural Language Processing

Fall 2021 / Auburn University

This repository is my implementation of assignments and projects of COMP 7970: Natural Language Processing.

Assignment 1: Latent Dirichlet Allocation Implementation

✅ Q1: Implementation of LDA

✅ Q2: Visualization Of Topics

✅ Q3: Implement KL-divergence

Assignment 2: Implementation of Word2Vec

In this assignment, I've implemented a custom Word2Vec model following the paper Efficient Estimation of Word Representation in Vector Space. I've implemented the Skip-gram model using PyTorch where word embedding size is 50. The model has been trained on Amazon Fine Food Review Dataset upto 10 epochs and the checkpoint file can be found here

✅ Q1: Implementation of Word2Vec

✅ Q2: Find Similar Words by loading the saved model

  • Coffee : craving
  • Tuna: sudorific

✅ Q3: Word Analogies with GloVe

In this part, I've used GloVe 300d vector and tested different word analogies. For example:

  • Spain is to Spanish as Germany is togerman
  • Japan is to Tokyo as France is to paris

Assignment 3: Implementation of Text Summarization

In this assignment, I've implemented two differenttext summarizer model using 1) Google PEGASUS and 2) Facebook BART. I used the pretrained model from huggingface and later fine-tuned on CNN/DailyMail dataset.

✅ Q1: Implementation of Text Summarizer

Model Script URL for Fine-tuned Model on huggingface
Pegasus Pegasus python Script https://huggingface.co/Mousumi/finetuned_pegasus
BART BART python Script https://huggingface.co/Mousumi/finetuned_bart

✅ Q2: Evaluation of Text Summarizer

To Run the test.py

For Pegasus:

python test.py --model=pegasus

For BART:

python test.py --model=bart

Results:

Model ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Precision Recall F1 Precision Recall F1 Precision Recall F1 Precision Recall F1
Pegasus 28.36 50.72 34.1 11.74 20.55 14.0 6.63 11.41 7.79 18.37 33.83 22.25
BART 56.51 15.4 23.64 26.55 6.77 10.52 14.93 3.58 5.61 46.1 12.56 19.27

References:

1. Model sharing and uploading huggingface