/Urdu-resource-NLP

This project contains Urdu characters and some preprocessing functions

Primary LanguageJupyter NotebookMIT LicenseMIT

Urdu-resource-NLP

This repo contain preprocessor , Stopwords and Other functionality that we need when we want to do work on Urdu NLP

  1. urdu.py contains URDU_DIACRITICS, URDU_DIGIT URDU_PUNCTUATIONS URDU_EXTRA_CHARACTER URDU_ALPHABET URDU_STOPWORDS

  2. The notebook preprocessor.ipynb contains some exaple of preprocesing

  3. capture_phone_or_email_from_text.py two function that accept string told that phone or email availabe in the text and return boolian vaule. The value
    0 -> Not found
    1 -> Found