/arabicnlptoolslist

Arabic NLP tools List inventory

GNU General Public License v3.0GPL-3.0

Arabic NLP Tools and Resources Lists

Arabic NLP tools List inventory

Tools

STEMMING

MORPHOLOGICAL ANALYSIS AND GENERATION

  • QalsadiQalsadi Qalsadi: Arabic mophological analyzer Library for python.
  • Buckwalter Arabic Morphological Analyzer (BAMA BAMA)
  • Standard Arabic Morphological Analyzer (SAMA SAMA, version 3.0 of BAMA)
  • ElixirFM ElixirFM : Functional Arabic Morphology
  • Xerox Arabic Morphological Analysis and Generation
  • (Deprecated) NMSU NMSU’s Arabic Morphological Analyzer - MAGEAD: Morphological Analysis and Generation for Arabic and its Dialects ~~- Almorgeana : Arabic Lexeme-based Morphological Generation and Analysis is distributed as part of the MADA system. ~~ - Alkhalil Alkhalil Morphological Analyzer
  • Araflex Araflex

MORPHOLOGICAL DISAMBIGUATION AND POS TAGGING

- Khoja Arabic Tagger Khoja Arabic Tagger

  • AMIRA: AMIRA: Toolkit for Arabic tokenization, POS tagging and base phrase chunking
  • MADA MADA: Morphological Analysis and Disambiguation for Arabic – a tool for tokenization, lemmatization, diacritization and POS tagging

PARSERS

NAMED ENTITY RECOGNITION

~~- Yassine Benajiba’s ANER (Arabic Named Entity Recognition) system ~~ - BBN’s Identifinder BBN’s Identifinder (English, Arabic, Chinese)

MACHINE TRANSLATION

TREE EDITING

LEXICOGRAPHY

  • aConCorde: A concordance generation program for Arabic

Verb conjugator

  • Qutrub Source on github
  • The CJKI Arabic Verb Conjugator (CAVE).
    An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs.
  • AraCon ARACON is a verb conjugator for Arabic implemented as part of a morphological Analyser and generator (java).

Transcription and transliteration

  • Arabic Transcription and Transliteration.
    An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.
  • The ARAN and NANA systems automatically transcribe CJK and Latin names to and from Arabic.

Numbers to words

  • Tafqit : Tafqeet of Arabic Number to Word تحويل الأرقام إلى ما يقابلها كتابة باللغة العربية

Poetry

Al-Faraheedy-Project

Resources

Corpora

Monolignual corpora

Multilingual corpora

Dicrionaries

Wordlists

  • Comprehensive Word Lists for Arabic (CJKAWORD).
    Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon.

  • Arabic Broken Plurals.
    A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.

ROOT LISTS

GAZETTEERS

  • ANERCorp : Is a Corpus of more than 150,000 words annotated for the NER task.
  • ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
  • FAOTERM: United Nations’ Food and Agriculture Organization of the Terminology refer- ence for country names (six languages including Arabic)
  • Foreignword.com’s country names in 16 languages including Arabic
  • Geonames.de’s multilingual resource for names of geographical entities (and other things)C.5. LEXICAL DATABASES 139
  • U.S. Board on Geographic Names (including Arab countries) – uses SATTS Arabic translit- eration
  • Database of Arab Names (DAN).
    A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors.

  • Database of Arab Names in Arabic (DANA).
    A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes.

  • Database of Arabic Business Names (DABNA).
    Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.

  • Expanded OFAC (XOFAC).
    To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC.

  • Database of Foreign Names in Arabic (DAFNA).
    A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors.

  • Dictionary of Arabic Place Names (DAPNA).
    A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.

Question answering

  • Documents:: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
  • [List of Questions]](http://users.dsic.upv.es/~ybenajiba/resources/):: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
  • [List of Correct Answers]](http://users.dsic.upv.es/~ybenajiba/resources/):: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.

Ontologies

SEMANTIC ONTOLOGIES

  • Arabic Wordnet Arabic VerbNet Arabic Verbnet is a lage scale verb lexicon that classifies verbs in Arabic using syntactic alternations inspired by the work of Kipper Schuler (2005) on English VerbNet.