Ranging from social media trends to medical records, the internet facilitates a never-ending creation of large volumes of unstructured textual data. The repercussion of this is the humanly fatigue that comes with processing and interpreting each snippet for meaning.
Modern computer systems can make sense of natural languages using an underlying technology called NLP (Natural Language Processing). This technology can process human language as input and perform one or more of the following operations:
- Sentiment analysis (is it a positive or negative statement?)
- Topic classification (what is it about? Artificial intelligence or Melodic Death Metal?) \m/
- Decide on what actions to take (what actions should be taken based on this statement?)
- Intent extraction (what is the intention behind this statement)
This repository contains several folders, each one with code to perform specific NLP actions:
- Tokenization
- Stemming
- Measuring similarity between two strings
- Classifying Input
- Sentiment Analysis
- Phonetic matching
- Spell checking
- Clone the repository
- Install dependencies -
npm install
- Navigate into the
src
directory and run the folders. For example, you can run the tokenization folder with this command -node tokenization