- Read CosmosQA paper to get deep understanding of paper concepts.
- Run Wilburone's model as a baseline model
- Read Papers of DistilBERT and RoBERTa and implemented them.
- Run DistilBERT model sucessfully.
- Implemented Roberta Large Multiway attention which gave 79.22% results with text simillarity module to extract important knowledge from the given context.
- Error Analysis is distributed among team members and plan to seperate observations accroding to error analysis type provided in basic analysis section.
- Results and Analysis included.
- Download trained model and evaluate it using python run_roberta.py --task_name "commonsenseqa" --do_eval --load_model --do_lower_case --roberta_model roberta-large --data_dir data/ --max_seq_length 220 --gradient_accumulation_steps=10 --output_dir output_path_bin_file --seed 7 --fp16
- Read paper of CosmosQA , BERT to get deep understanding of paper concepts
- Get information of basic methods used in comprehension and Question answering based problems and stated it with error analysis in basic analysis section.
- Run Wilburone's model as a baseline model in the local system as well as on Google Colab by trying various batchsize, learning rates and epochs.
- Modified the code and Run Distilbert model and saved results in Result folder.
- Now understand the logic behind using multiway attention model provided in CosmosQA and plan to implement it with more advanced models.
- Worked to understand Roberta Multiway attention and trained model and Extract Analysis where both bert and Roberta goes wrong .
- Worked on Knowldege infusion via finetune BERT on socialiqa dataset and then run cosmosqa on top of it.
- Read the paper of CosmosQA to learn more about the dataset and BERT to get a ground level understanding of the BERT model.
- Ran the Bert base code given in the project resources to understand the implementation of BERT.
- Explored a new possibility of Implementing K adapters in the Bert Pretrained Model by reading the K-Adapters Model.
- Tried to run the DistilBert code on local system but due to system limitations could not get the code to work.
- Ran the DistilBERT code on Google Colab with different settings.
- Trying to implement RoBerta on Google Colab.
- Implemented the RoBERTa-large model on google colab.
- Researched about generative models for implementation but did not move forward due to complications.
- Read about text fooler and tried to make it work on implementation level but did not succeed.
- Implemented Query based Text Similarity to summarize context which led to minor improvement in the performance of model and tried to implement complete Text Summarization of the context.
- Read paper of CosmosQA , BERT to get an overall understanding of the base model of our implementation.
- Ran the DistilBERT code on local system.
- Read the LSTMJump, and SkimRNN to find ways to augment the information used as the input to the BERT model.
- Formualized idea to make changes in DistilBERT using the analysis of Base BERT model to reach conclusion faster.
- Currently conducting Error Analysis on the output of the BERT and DistilBERT model .
- Planning to implement multiway attention model of CosmosQA for DistilBERT
- Looking to make changes in the model apart from data augmentation to increase accuracy.
- Completed my part of the the Error Analysis on Roberta-large's predictions.
- Implemented Text Similarity and Summarization code and analyzed a threshold value for context truncation.
- Read following papers
- CosmosQA
- BERT, RoBERTa
- Natural Language QA Approaches using Reasoning with External Knowledge (Chitta Baral, Pratyay Banerjee, Kuntal Kumar Pal, Arindam Mitra)
- ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
- Other preparations
- PyTorch basics
- huggingface: transformers, tokenizers code walkthrough
- Understanding dataset and project setup for base model implementation(wilburOne/CosmosQA)
- Run Wilburone repo base implementation Wilburone's model
- Code changes to remove various error & adapting local machine
- First successful Run after many code changes
- Tried different hyper-parameters for optimized run selection for local configuration we have(limited GPU)
- Error analysis done for BERT vs DistilBERT base model run
- Ideation for various other method to increase model knowledge. some papers for ideas:
- Structural Scaffolds for Citation Intent Classification in Scientific Publications - https://arxiv.org/abs/1904.01608 (here authours are adding two extra tasks(scaffolds) apart from main task on top of BiLSTM-Attn. similar techniqes can be applied here with adding more features on top of BERT.)
- Error analysis done for BERT vs DistilBERT base model run
- Model implementations with changes
COSMOS QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
Model and Approach Analysis: There are two types of Baseline models. Reading Comprehensions and modifications on it and pretrained models, which are used as a general approaches for this problems.
- Sliding Windows
- Stanford Active Reader
- Gated Attention Reader
- Co Matching
- Common Sense RC
- GPT-FT
- GPT-FT
- BERT-FT
- DMCN
In Reading comprehension approach, semantic correlatedness is important factor to choose an answer from given answers where it infers from the given contextual paragraph about semantic correlations.
In COSMOS-QA dataset, it contains 83% of answers which is not in reading comprehension context so semantic relatedness factor is not important here to infer the correct answer as it requires common sense to infer answer, while pretrained methods improves the scenarios further while apply finetuning on BERT. Further, more accurate results can be achieved by performing attention and finetuning on context paragraph, Question and Answer.
Ablation is also one of the important parts of the study where ablation of question didn't affect much in prediction result while ablation of question and context affects significantly on the result and got drop in accuracy
Knowledge transfer and finetuning on various datasets of same kind help a lot to improve inference. Authors have proposed two datasets RACE and SWAG which contains multiple choice questions, and fine tuning of BERT on both Race + SWAG and Cosmos. BERT-FT on SWAG provides good result while including with BERT FT RACE+SWAG which gives 68.7 percent test accuracy
- Complex context understanding
It require cross sentence interpretation and reasoning and need to combine the context information to infer the real answer. Here model need to learn from complex context analysis to infer the choice A.
- Inconsistent with Human Common Sense
In 33% of the errors, the model mistakenly select the choice which is not consistent with human common sense. So here answer might be right but not match with human commonsense.
- Multi-turn Common Sense Inference
19% errors due to this where there are multiple inferences present in sentence where model need to choose with proper inference.
- Unanswerable Questions
14% of the errors are from unanswerable questions so model cannot infer from given multiple answers.
Fine tuning on GPT and GPT-2 can provide better understanding and provide more accurate answer which can be consider as one of the different approach.