Arabic NLP tools List inventory
- Tashaphyne Light Stemmer Tashaphyne Light Stemmer
- Khoja Arabic Stemmer Khoja Arabic Stemmer
- Arabic Stemmers: Sebawai and Al-Stem Sebawai and Al-Stem (Contact Dr. Kareem Darwish)
- Larkey’s L-stem Larkey’s L-stem (contact authors) D.2
- Farasa Segmentor Farasa
- ARBML/tkseem
- Alkhalil Lemmatizer
- Alkhalil Stemmer
- Alkhalil Root Extractor
- QalsadiQalsadi Qalsadi: Arabic mophological analyzer Library for python.
- Buckwalter Arabic Morphological Analyzer (BAMA BAMA)
- Standard Arabic Morphological Analyzer (SAMA SAMA, version 3.0 of BAMA)
- ElixirFM ElixirFM : Functional Arabic Morphology
- Xerox Arabic Morphological Analysis and Generation
- (
Deprecated) NMSU NMSU’s Arabic Morphological Analyzer- MAGEAD: Morphological Analysis and Generation for Arabic and its Dialects~~- Almorgeana : Arabic Lexeme-based Morphological Generation and Analysis is distributed as part of the MADA system. ~~ - Alkhalil Alkhalil Morphological Analyzer - Araflex Araflex
- Khoja Arabic Tagger Khoja Arabic Tagger
- AMIRA: AMIRA: Toolkit for Arabic tokenization, POS tagging and base phrase chunking
- MADA MADA: Morphological Analysis and Disambiguation for Arabic – a tool for tokenization, lemmatization, diacritization and POS tagging
- [The Stanford Parser] (
- The Bikel Parser
- MALTParser
- Mohammed Attia ’s Rule-based Parser for MSA
~~- Yassine Benajiba’s ANER (Arabic Named Entity Recognition) system ~~
- BBN’s Identifinder BBN’s Identifinder (English, Arabic, Chinese)
- Statistical MT public resources: Giza alignment, Pharaoh and Moses decoders, etc.
- Turjuman : is a neural machine translation toolkit from 20 languages into Modern Standard Arabic. Demo.
- Tred for Arabic Tred for Arabic - Tree Editor with Arabic support
- aConCorde: A concordance generation program for Arabic
- Qutrub Source on github
- The CJKI Arabic Verb Conjugator (CAVE).
An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs. - AraCon ARACON is a verb conjugator for Arabic implemented as part of a morphological Analyser and generator (java).
- Arabic Transcription and Transliteration.
An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology. - The ARAN and NANA systems automatically transcribe CJK and Latin names to and from Arabic.
- Tafqit : Tafqeet of Arabic Number to Word تحويل الأرقام إلى ما يقابلها كتابة باللغة العربية
- Abuelkhair Corpus, 1.5 billion Arabic words corpus includes more than 5 million newspaper articles, over 1.5 billion words, about 3 million unique words. The corpus is encoded (UTF-8,CP-1256) and marked as XML and SGML.
- Tashkeela Arabic vocalized (diacritized) Texts corpus
- A fully diacritized modernA fully diacritized modern Arabic translation of the Bible (by Biblica).
- The CJKI Arabic Learner’s Dictionary (CALD) (.pdf).
A new concept dictionary that enables learners to gain a full understanding of MSA core vocabulary. An Arabic summary is available at القاموس العربي الإنجليزي للمتعلمين (.pdf)
Comprehensive Word Lists for Arabic (CJKAWORD).
Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon. -
Arabic Broken Plurals.
A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.
- Buckwalter’s list of Arabic roots
- Project Root List
- Root list inside the morphological analyzer Sebawai (Contact Dr. Kareem Darwish)
- ANERCorp : Is a Corpus of more than 150,000 words annotated for the NER task.
- ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
- FAOTERM: United Nations’ Food and Agriculture Organization of the Terminology refer- ence for country names (six languages including Arabic)
-’s country names in 16 languages including Arabic
-’s multilingual resource for names of geographical entities (and other things)C.5. LEXICAL DATABASES 139
- U.S. Board on Geographic Names (including Arab countries) – uses SATTS Arabic translit- eration
Database of Arab Names (DAN).
A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors. -
Database of Arab Names in Arabic (DANA).
A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes. -
Database of Arabic Business Names (DABNA).
Arabic Companies and Organizations. A database of Arabic company and organization names is now under development. -
Expanded OFAC (XOFAC).
To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC. -
Database of Foreign Names in Arabic (DAFNA).
A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors. -
Dictionary of Arabic Place Names (DAPNA).
A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.
- Documents:: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
- [List of Questions]]( This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
- [List of Correct Answers]]( For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.
- Arabic Wordnet Arabic VerbNet Arabic Verbnet is a lage scale verb lexicon that classifies verbs in Arabic using syntactic alternations inspired by the work of Kipper Schuler (2005) on English VerbNet.