/NLP-Tablets-Annotation

College NLP Hackathon

Primary LanguageJupyter NotebookMIT LicenseMIT

NLP-Tablets-Annotation

Given a set of tablets images, do OCR, convert the image to text, extract necessary details such as name of medicine, molecules in it, date of manufacturing, date of expiry. Convert this text into speech. This can be done by creating a drug database by scraping drug details and form a lexicon. Can use api for text to speech conversion

Streamlit


CPT2303191942-800x800

Checkout the report made for this project here.

Download the required model from here add it to model folder.

Tasks Done

  • Scraping Medicine Images from https://www.netmeds.com/prescriptions
  • PreProcessing Image
  • Extracting Text from Image using Paddle OCR
  • Generating Vocabulary of Words for Spelling Correction
  • Spelling Corection using Minimum Edit Distance
  • Annotating Text from Training Corpus
  • Training NER Model using Spacy with Training data
  • Extracting Required Entities from given text
  • Forming Lexicon for different categories of Image
  • Displaying the eend results in Web App built using Streamlit

Datasets Used

Libraries Used

  • PaddleOCR - For Text Extraction from Image
  • Spacy - For Training NER Model
  • CV2 - For Image processing
  • NLTK - For Text processing
  • TTS - Google API for text to speech conversion
  • Streamlit - For building Web Application

Colab Links