Dear contributors, please be aware that cuneiform languages are different. For instance, the most popular are Elamite, Babylonian and Old Persian; we are working on Old Persian. Below you can see the differences:
(Photo is taken from national museum of Iran, the gold plate of king Darius)
/imagedata/
/source/
/king/
source_king_001.jpg
#example:
/behistun/
/darius_1/
behistun_darius_1_001.jpg
/textdata/
/eng_transcription_to_english/
/metadata/
eng_transcription_to_english_001.json
/eng_transliteration_to_english/
/metadata/
eng_transliteration_to_english_001.json
/single/
/metadata/
/eng_transliteration/
eng_transliteration_001.json
# "single" refers to text data that are just a text without translation
Translating Old Persian language has some methods, for example, transliteration and transcription. Below you can see an example to know the difference between them:
For each directory a "source.metadata.csv" file is provided to see the information of data.
-
Book: The Inscriptions in Old Persian Cuneiform of the Achaemenian Emperors by Ralph Norman Sharp
-
Personal photography from national museum of Iran and Takht-e-Jamshid (Persepolis)
In the first stage, Old Persian cuneiform will be converted to English transcription text as an output using an OCR model. In the second stage, that English transcription text will be the input for an NLP or Large language model (LLM) model to be converted to modern languages. The NLP model performs as a machine translation model
Behistun:بیستون
Susa:شوش
Persepolis:پرسپولیس(تخت جمشید)
Elamite:ایلامی
Babylonian:بابِلی
Cyrus:کوروش
Xerxes:خشایار
Artaxerxes:اردشیر
𐎠𐎢𐎼𐎶𐏀𐎡𐎠:اهورامزدا
This repository is under CC-BY-NC license and any commercial use is prohibited.