AkhilD1/Stories-Summarization-Q-A-System

Jupyter Notebook

Stories - Summarization and Question-Answering

Scraping

The scrapy projects to download aesop fables and short stories from reedsy are in scrapers

Run "scrapy crawl BedtimeStories -o bedtimestories.jsonl" to retrieve reedsy short stories
Run "scrapy crawl aesop -o aesopfables.jsonl" to retrieve aesop fables.

Scripts

Run AesopCreateCSV.py to clean and export aesop fables to a csv file
Run CleanDataAndCreateCSV.py to clean and export reedsy short stories to a csv file
The generated csv files are stored in data/aesop and data/bedtimestories respectively

Data Annotations

aesopfables.csv is divided into aesopfables-train.csv and aesopfables-test.csv
bedtimestories.csv is divided into bedtimestories_train.csv and bedtimestories_test.csv
The above train and test files are annotated using Haystack Annotation Tool.
The annotations in SQuAD format are stored in data/qa-squad.
The annotations in csv format are stored in data/qa-csv.

Data Summarisations

Around 50 aesop fables are summarised and stored in aesopfables-summaries.csv for model evaluation purposes

Models - Summarization

notebooks/summarization_abstractive_bart.ipynb
notebooks/summarization_abstractive_long_t5_tglobal_xl.ipynb
notebooks/summarization_abstractive_pegasus.ipynb
notebooks/summarization_extractive_bert.ipynb
notebooks/Q_A_Summarization_pretrainedmodels.ipynb

Models - Question and Answering

notebooks/Q_A_Summarization_pretrainedmodels.ipynb
notebooks/Omniscient_Reader_Finetuning_BERT.ipynb

-----------------------------------------

Archive

Summarization was done on the first chapter of Harry Potter and the Philosopher's stone using extractive summarization methods provided as part of the 'sumy' package.

lex_rank
luhn
lsa
text_rank