Arabic NLP tools List inventory
- Tashaphyne Light Stemmer Tashaphyne Light Stemmer
- Khoja Arabic Stemmer Khoja Arabic Stemmer
- Arabic Stemmers: Sebawai and Al-Stem Sebawai and Al-Stem (Contact Dr. Kareem Darwish)
- Larkey’s L-stem Larkey’s L-stem (contact authors) D.2
- Farasa Segmentor Farasa
- ARBML/tkseem
- Alkhalil Lemmatizer
- Alkhalil Stemmer
- Alkhalil Root Extractor
- QalsadiQalsadi Qalsadi: Arabic mophological analyzer Library for python.
- Buckwalter Arabic Morphological Analyzer (BAMA BAMA)
- Standard Arabic Morphological Analyzer (SAMA SAMA, version 3.0 of BAMA)
- ElixirFM ElixirFM : Functional Arabic Morphology
- Xerox Arabic Morphological Analysis and Generation
- (
Deprecated) NMSU NMSU’s Arabic Morphological Analyzer- MAGEAD: Morphological Analysis and Generation for Arabic and its Dialects~~- Almorgeana : Arabic Lexeme-based Morphological Generation and Analysis is distributed as part of the MADA system. ~~ - Alkhalil Alkhalil Morphological Analyzer - Araflex Araflex
- Khoja Arabic Tagger Khoja Arabic Tagger
- AMIRA: AMIRA: Toolkit for Arabic tokenization, POS tagging and base phrase chunking
- MADA MADA: Morphological Analysis and Disambiguation for Arabic – a tool for tokenization, lemmatization, diacritization and POS tagging
- [The Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml)
- The Bikel Parser
- MALTParser
- Mohammed Attia ’s Rule-based Parser for MSA http://www.attiaspace.com/
~~- Yassine Benajiba’s ANER (Arabic Named Entity Recognition) system ~~
- BBN’s Identifinder BBN’s Identifinder (English, Arabic, Chinese)
- Statistical MT public resources: Giza alignment, Pharaoh and Moses decoders, etc.
- Turjuman : is a neural machine translation toolkit from 20 languages into Modern Standard Arabic. Demo.
- Tred for Arabic Tred for Arabic - Tree Editor with Arabic support
- aConCorde: A concordance generation program for Arabic
- Qutrub Source on github
- The CJKI Arabic Verb Conjugator (CAVE).
An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs. - AraCon ARACON is a verb conjugator for Arabic implemented as part of a morphological Analyser and generator (java).
- Arabic Transcription and Transliteration.
An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology. - The ARAN and NANA systems automatically transcribe CJK and Latin names to and from Arabic.
- Tafqit : Tafqeet of Arabic Number to Word تحويل الأرقام إلى ما يقابلها كتابة باللغة العربية
- Abuelkhair Corpus, 1.5 billion Arabic words corpus includes more than 5 million newspaper articles, over 1.5 billion words, about 3 million unique words. The corpus is encoded (UTF-8,CP-1256) and marked as XML and SGML.
- Tashkeela Arabic vocalized (diacritized) Texts corpus
- A fully diacritized modernA fully diacritized modern Arabic translation of the Bible (by Biblica).
- The CJKI Arabic Learner’s Dictionary (CALD) (.pdf).
A new concept dictionary that enables learners to gain a full understanding of MSA core vocabulary. An Arabic summary is available at القاموس العربي الإنجليزي للمتعلمين (.pdf)
-
Comprehensive Word Lists for Arabic (CJKAWORD).
Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon. -
Arabic Broken Plurals.
A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.
- Buckwalter’s list of Arabic roots
- Project Root List
- Root list inside the morphological analyzer Sebawai (Contact Dr. Kareem Darwish)
- ANERCorp : Is a Corpus of more than 150,000 words annotated for the NER task.
- ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
- FAOTERM: United Nations’ Food and Agriculture Organization of the Terminology refer- ence for country names (six languages including Arabic)
- Foreignword.com’s country names in 16 languages including Arabic
- Geonames.de’s multilingual resource for names of geographical entities (and other things)C.5. LEXICAL DATABASES 139
- U.S. Board on Geographic Names (including Arab countries) – uses SATTS Arabic translit- eration
-
Database of Arab Names (DAN).
A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors. -
Database of Arab Names in Arabic (DANA).
A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes. -
Database of Arabic Business Names (DABNA).
Arabic Companies and Organizations. A database of Arabic company and organization names is now under development. -
Expanded OFAC (XOFAC).
To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC. -
Database of Foreign Names in Arabic (DAFNA).
A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors. -
Dictionary of Arabic Place Names (DAPNA).
A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.
- Documents:: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
- [List of Questions]](http://users.dsic.upv.es/~ybenajiba/resources/):: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
- [List of Correct Answers]](http://users.dsic.upv.es/~ybenajiba/resources/):: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.
- Arabic Wordnet Arabic VerbNet Arabic Verbnet is a lage scale verb lexicon that classifies verbs in Arabic using syntactic alternations inspired by the work of Kipper Schuler (2005) on English VerbNet.