#Project: Natural Language Processing Assignment
Description: This project involves analyzing text data using natural language processing techniques. The tasks include building term-document and term-context matrices, applying weighting schemes, analyzing associations between words, and examining potential social biases in the data.
Environment:
- Python 3.8.10
- Required Packages: scipy, numpy
Usage:
- Download or clone the project repository.
- Ensure Python 3.8.10 is installed on your system.
- Install the required packages.
- Run the Python scripts(.py)to execute the code.
Resources:
-
Skeleton Python code: This code provides a starting point for the assignment and contains stubs for some functions that need to be implemented.
-
CSV of the complete works of Shakespeare: This file contains the complete works of Shakespeare in CSV format. Each row represents a line from a play, with columns for play name, line number, character name, and the actual line.
-
Vocab of the complete works of Shakespeare: This file contains the vocabulary extracted from the complete works of Shakespeare.
-
List of all plays in the dataset: This file contains a list of all the plays included in the Shakespeare dataset.
-
SNLI corpus: This corpus contains selections from the Stanford Natural Language Inference (SNLI) dataset, which is used for the NLP task of natural language inference. Each line in the corpus contains a sentence that is either a premise (an image caption) or a hypothesis produced by annotators to be in a certain logical relation with the associated premise (entailment, neutral, contradiction). The sentenceID column is a unique index for each sentence, and captionID is an ID for all sentences associated with the same caption/premise.
-
List of identity labels from Rudinger et al. 2017: This file contains a list of identity labels that can be used for analyzing social biases in the SNLI corpus.
Discussion: No discussions with others were held regarding this assignment.
Generative AI tool: No generative AI tool was used for this assignment.
Unresolved issues or problems: No unresolved issues or problems were encountered during the completion of this assignment.
References:
- Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 632-642).
- Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2017). Social Bias in Elicited Natural Language Inferences.