/CS-F429-Disfluency-Analysis

NLP course project

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

CS-F429-Disfluency-Analysis

  • This project aims at investigating various Natural Language Processing techniques for the task of Disfluency Detection.
  • Our approach involves attempting to look at it from the lens of a modified Named Entity Recognition problem, and involved the utilization of finetuned BERT as well as Bi-LSTM based Neural Networks to achieve the same.
  • The experiments have been performed on modified versions of the DisflQA corpus and Switchboard Corpus, annotated as per requirement.

Disfl-QA dataset obtained from: Gupta, A., Xu, J., Upadhyay, S., Yang, D., & Faruqui, M. (2021). Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering. Findings of ACL. https://doi.org/10.18653/v1/2021.findings-acl.293 Link to github: https://github.com/google-research-datasets/Disfl-QA

Switchboard Corpus obtained from: Godfrey, John J., and Edward Holliman. Switchboard-1 Release 2 LDC97S62. Web Download. Philadelphia: Linguistic Data Consortium, 1993. The data section in the repository provides only sample data.