spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.
-
Tagger (POS Tagger): Is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc.,
-
Parser ( Dependency Parser): Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical structure and defines the relationships between “head” words and words, which modify those heads.
-
NER (Named Entity Recognition): Named-entity recognition is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
-
Please go though this link to learn more about spaCy: https://spacy.io/usage/training
Our main agenda is to implement a custom sentiment model to predict the words/word in a text that ultimately decides/influences the sentiment/emotion of a text .
To acheive this, we are using self-annotated dataset using Prodigy. The train dataset has following columns/attributes:
- textID: text identifier
- text: review/tweets in the form of text
- sentiment: The emotion associated with the text (Positive, Negative, and Neutral)
- selected_text: Part of text that plays a major contribution in deciding the sentiment of the text
The test dataset has following columns/attributes:
- textID: text identifier
- text: review/tweets in the form of text
- sentiment: The emotion associated with the text(Positive, Negative, and Neutral)
- selected_text: Predict the word/words that influence the polarity of the text
We have used Jaccard Similarity metric to evaluate the text similarity!
Happy Coding!