/urduhack

Natural Language Processing library for ( 🇵🇰)Urdu language.

Primary LanguagePythonMIT LicenseMIT

Urduhack: NLP library for ( 🇵🇰 ) Urdu language

License: MIT image image wheel Build Status codecov Last commit image Join Slack Say Thanks!

Feature Support

  • Arabic and Urdu Unicode Redundancy Problem
  • Normalization
    • Urdu Single Character Normalization
    • Urdu Combined Characters Normalization
  • Urdu Data Pre-Processing
    • Urdu Diacritics Removal
    • Urdu Spaces Before & After Digits
    • Urdu Spaces After Punctuations
    • Urdu Joined Words Fix

To Do

  • Tokenization
    • Sentence Tokenization
    • Words Tokenization
  • Classification
    • Sentimental Analysis
    • Sentence Classification
    • Documents Classification

Urduhack officially supports Python 3.6–3.7, and runs great on PyPy.

Installation

To install Requests, simply use pip

$ pip install urduhack

Documentation

Fantastic documentation is available at https://urduhack.readthedocs.io/

How to Contribute

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.
  2. Write a test which shows that the bug was fixed or that the feature works as expected.
  3. Send a pull request and bug the maintainer until it gets merged and published. :)