Asmai: (Al'asma'i) Arabic semantic analysis library for Python
Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com
Features | value |
---|---|
Authors | Authors.md |
Release | 0.1 |
License | GPL |
Tracker | linuxscout/asmai/Issues |
Source | Github |
Feedbacks | Comments |
Accounts |
Asmai: (Al'asma'i) Arabic semantic analysis library for Python, it provides extracting word pairs that carry meanings of the type: (subject-verb, verb-object, word composition)
- استخلاص ثنائيات الكلمات التي تحمل دلالات من نوع : (فاعلية، مفعولية، إضافة)
pip install asmai
@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}
import asmai.anasem as asm
text = u"يعبد الله منذ أن تطلع الشمس"
result = []
anasem = asm.SemanticAnalyzer()
result = anasem.analyze_text(text)
# the result contains objets
anasem.pprint(result)
- Extract semantic relation, display only found relations
>>> import pprint
>>> sem_result = anasem.display_sem(result)
>>> pprint.pprint(sem_result)
[[['الشَّمْسُ', 'تَطْلُعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject']]]
- Extract semantic relation, display all words and tags
>>> sem_result = anasem.display_sem(result, all=True)
>>> pprint.pprint(sem_result)
[('يعبد', 'O', []),
('الله', 'O', []),
('منذ', 'O', []),
('أن', 'O', []),
('تطلع', 'B', []),
('الشمس',
'I',
[['الشَّمْسُ', 'تَطْلُعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject']])]
>>>
- convert to pandas
>>> import pandas as pd
>>>
>>> # flatten the result
... df = pd.DataFrame(anasem.decode(result))
>>> print(df.head())
action affix affix_key forced_word_case ... unvocalized unvoriginal vocalized word
0 -ي-- -ي--|المضارع المنصوب:هو:y False ... يعبد عبد يُعَبِّدَ يعبد
1 -ي-- -ي--|المضارع المجهول المجزوم:هو:y False ... يعبد عبد يُعَبَّدْ يعبد
2 -ي-- -ي--|المضارع المجهول:هو:y False ... يعبد عبد يُعَبَّدُ يعبد
3 -ي-- -ي--|المضارع المعلوم:هو:y False ... يعبد عبد يُعَبِّدُ يعبد
4 -ي-- -ي--|المضارع المجزوم:هو:y False ... يعبد عبد يُعَبِّدْ يعبد
[5 rows x 50 columns]
>>> df.to_csv("output/test.csv", encoding="utf8", sep="\t")
>>>
1- pyarabic
2. sqlite
3. sylajone
CREATE TABLE sqlite_sequence(name,seq);
CREATE TABLE "derivations" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE ,
"verb" varchar NOT NULL ,
"transitive" BOOL NOT NULL DEFAULT 1,
"derived" VARCHAR NOT NULL ,
"type" VARCHAR NOT NULL
);
CSV Structure:
- Derivattion
- id : id unique in the database
- verb : vocalized collocation
- transtive : if the verb is transitive
- derived : derived word from verb number
- type : type
CREATE TABLE "relations" (
"id" INTEGER PRIMARY KEY NOT NULL ,
first" VARCHAR NOT NULL DEFAULT ('') ,
"second" VARCHAR NOT NULL DEFAULT ('') ,
"rule" VARCHAR NOT NULL DEFAULT (0)
);
CSV Structure:
- id : id unique in the database
- first: first word
- second: second word
- rule : the extraction rule number :