Vector-space-word-similarity: A Python repository from kevotushap

#Project: Natural Language Processing Assignment

Description: This project involves analyzing text data using natural language processing techniques. The tasks include building term-document and term-context matrices, applying weighting schemes, analyzing associations between words, and examining potential social biases in the data.

Environment:

Python 3.8.10
Required Packages: scipy, numpy

Usage:

Download or clone the project repository.
Ensure Python 3.8.10 is installed on your system.
Install the required packages.
Run the Python scripts(.py)to execute the code.

Resources:

Skeleton Python code: This code provides a starting point for the assignment and contains stubs for some functions that need to be implemented.
CSV of the complete works of Shakespeare: This file contains the complete works of Shakespeare in CSV format. Each row represents a line from a play, with columns for play name, line number, character name, and the actual line.
Vocab of the complete works of Shakespeare: This file contains the vocabulary extracted from the complete works of Shakespeare.
List of all plays in the dataset: This file contains a list of all the plays included in the Shakespeare dataset.
SNLI corpus: This corpus contains selections from the Stanford Natural Language Inference (SNLI) dataset, which is used for the NLP task of natural language inference. Each line in the corpus contains a sentence that is either a premise (an image caption) or a hypothesis produced by annotators to be in a certain logical relation with the associated premise (entailment, neutral, contradiction). The sentenceID column is a unique index for each sentence, and captionID is an ID for all sentences associated with the same caption/premise.
List of identity labels from Rudinger et al. 2017: This file contains a list of identity labels that can be used for analyzing social biases in the SNLI corpus.

Discussion: No discussions with others were held regarding this assignment.

Generative AI tool: No generative AI tool was used for this assignment.

Unresolved issues or problems: No unresolved issues or problems were encountered during the completion of this assignment.

References:

Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 632-642).
Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2017). Social Bias in Elicited Natural Language Inferences.

kevotushap/Vector-space-word-similarity