/Vector-space-word-similarity

Natural Language Processing.

Primary LanguagePython

#Project: Natural Language Processing Assignment

Description: This project involves analyzing text data using natural language processing techniques. The tasks include building term-document and term-context matrices, applying weighting schemes, analyzing associations between words, and examining potential social biases in the data.

Environment:

  • Python 3.8.10
  • Required Packages: scipy, numpy

Usage:

  1. Download or clone the project repository.
  2. Ensure Python 3.8.10 is installed on your system.
  3. Install the required packages.
  4. Run the Python scripts(.py)to execute the code.

Resources:

  1. Skeleton Python code: This code provides a starting point for the assignment and contains stubs for some functions that need to be implemented.

  2. CSV of the complete works of Shakespeare: This file contains the complete works of Shakespeare in CSV format. Each row represents a line from a play, with columns for play name, line number, character name, and the actual line.

  3. Vocab of the complete works of Shakespeare: This file contains the vocabulary extracted from the complete works of Shakespeare.

  4. List of all plays in the dataset: This file contains a list of all the plays included in the Shakespeare dataset.

  5. SNLI corpus: This corpus contains selections from the Stanford Natural Language Inference (SNLI) dataset, which is used for the NLP task of natural language inference. Each line in the corpus contains a sentence that is either a premise (an image caption) or a hypothesis produced by annotators to be in a certain logical relation with the associated premise (entailment, neutral, contradiction). The sentenceID column is a unique index for each sentence, and captionID is an ID for all sentences associated with the same caption/premise.

  6. List of identity labels from Rudinger et al. 2017: This file contains a list of identity labels that can be used for analyzing social biases in the SNLI corpus.

Discussion: No discussions with others were held regarding this assignment.

Generative AI tool: No generative AI tool was used for this assignment.

Unresolved issues or problems: No unresolved issues or problems were encountered during the completion of this assignment.

References:

  • Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 632-642).
  • Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2017). Social Bias in Elicited Natural Language Inferences.