
Natural Language Processing.

Primary LanguagePython

#Project: Natural Language Processing Assignment

Description: This project involves analyzing text data using natural language processing techniques. The tasks include building term-document and term-context matrices, applying weighting schemes, analyzing associations between words, and examining potential social biases in the data.


  • Python 3.8.10
  • Required Packages: scipy, numpy


  1. Download or clone the project repository.
  2. Ensure Python 3.8.10 is installed on your system.
  3. Install the required packages.
  4. Run the Python scripts(.py)to execute the code.


  1. Skeleton Python code: This code provides a starting point for the assignment and contains stubs for some functions that need to be implemented.

  2. CSV of the complete works of Shakespeare: This file contains the complete works of Shakespeare in CSV format. Each row represents a line from a play, with columns for play name, line number, character name, and the actual line.

  3. Vocab of the complete works of Shakespeare: This file contains the vocabulary extracted from the complete works of Shakespeare.

  4. List of all plays in the dataset: This file contains a list of all the plays included in the Shakespeare dataset.

  5. SNLI corpus: This corpus contains selections from the Stanford Natural Language Inference (SNLI) dataset, which is used for the NLP task of natural language inference. Each line in the corpus contains a sentence that is either a premise (an image caption) or a hypothesis produced by annotators to be in a certain logical relation with the associated premise (entailment, neutral, contradiction). The sentenceID column is a unique index for each sentence, and captionID is an ID for all sentences associated with the same caption/premise.

  6. List of identity labels from Rudinger et al. 2017: This file contains a list of identity labels that can be used for analyzing social biases in the SNLI corpus.

Discussion: No discussions with others were held regarding this assignment.

Generative AI tool: No generative AI tool was used for this assignment.

Unresolved issues or problems: No unresolved issues or problems were encountered during the completion of this assignment.


  • Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 632-642).
  • Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2017). Social Bias in Elicited Natural Language Inferences.