/TreeSQuAD

This dataset has been made publicly available as part of the Dalhousie Natural Language Processing Lab (DNLP) research, which focuses on incorporating Structural Embedding of Constituency Trees in the Attention-Based Model for Machine Comprehension.

Primary LanguagePython

TreeSQuAD2.0 Dataset

Welcome to the TreeSQuAD2.0 dataset, a public resource created by the Dalhousie Natural Language Processing Lab (DNLP). This dataset is the result of my master's thesis research, focusing on incorporating Structural Embedding of Constituency Trees in the Attention-Based Model for Machine Comprehension. The thesis can be accessed here.

Contents in the 'Processed' Folder

The 'Processed' folder contains the following:

  1. Parsed Trees:

  2. Simplified Trees:

  3. Vocabulary:

    • Vocabulary of Tokens.

Usage

Feel free to explore and utilize the dataset for your NLP and machine comprehension projects. If you find this resource helpful, consider citing this work or providing feedback.

Acknowledgments

I am sincerely grateful to Dr. Vlado Keselj for his invaluable guidance and support throughout this research.

Happy coding!