/NLP

NLP projects

Primary LanguageJupyter NotebookMIT LicenseMIT

NLP

This repository contains various resources on NLP projects

Senti_Analysis

It's a jupyter notebook with an example for sentiment analysis. It also serves as an introductory tutorial for Naive Bayes algorithm. The text in notebook contains several typos. Corrected version of text can be found here.

NER_FromImages

This notebook contains information on setting a basic pipeline for extracting information from images. Tesseract is used for extracting the information from image while NER pipelines are set using Transformer's pipeline and spaCy. An extended version of this notebook is NER_FromImages2. A detailed version along with data can also be found here.

NER_FromImages2

This notebook contains code to extract text data from XML files and then apply name entity recognition to find date, company name, invoice no. etc using spaCy pipeline. A detailed version along with data can also be found here. The data used in the notebook can be found on kaggle here.