/quran-tajweed

Tajweed annotation for the Qur'an

Primary LanguagePython

quran-tajweed

Tajweed annotations for the Qur'an (riwayat hafs). The data is available as a JSON file with exact character indices for each rule, and as individual decision trees for each rule.

You can use this data to display the Qur'an with tajweed highlighting, refine models for Qur'anic speech recognition, or - if you enjoy decision trees - improve your own recitation.

The following tajweed rules are supported:

  • Ghunnah (ghunnah)
  • Idghaam...
    • With Ghunnah (idghaam_ghunnah)
    • Without Ghunnah (idghaam_no_ghunnah)
    • Mutajaanisain (idghaam_mutajaanisain)
    • Mutaqaaribain (idghaam_mutaqaaribain)
    • Shafawi (idghaam_shafawi)
  • Ikhfa...
    • Ikhfa (ikhfa)
    • Ikhfa Shafawi (ikhfa_shafawi)
  • Iqlab (iqlab)
  • Madd...
    • Regular: 2 harakat (madd_2)
    • al-Aarid/al-Leen: 2, 4, 6 harakat (madd_246)
    • al-Muttasil: 4, 5 harakat (madd_muttasil)
    • al-Munfasil: 4, 5 harakat (madd_munfasil)
    • Laazim: 6 harakat (madd_6)
  • Qalqalah (qalqalah)
  • Hamzat al-Wasl (hamzat_wasl)
  • Lam al-Shamsiyyah (lam_shamsiyyah)
  • Silent (silent)

This project was built using information from ReciteQuran.com, the Dar al-Maarifah tajweed masaahif, and others.

Using the tajweed JSON file

All the data you probably need is in output/tajweed.hafs.uthmani-pause-sajdah.json. It has the following schema:

[
    {
        "surah": 1,
        "ayah": 1,
        "annotations": [
            {
                "rule": "madd_6",
                "start": 245,
                "end": 247
            },
            ...
        ]
    },
    ...
]

The start and end indices of each annotation refer to the Unicode codepoint (not byte!) offset within the Tanzil.net Uthmani Qur'an text. Make sure to download the version with pause marks and sajdah signs, but without rub-el-hizb signs or me_quran tanween shapes. If you use different options or a different text entirely, you must rebuild the data file from scratch (at your own risk) - refer to the next section.

This data file is licensed under a Creative Commons Attribution 4.0 International License.

Using the decision trees

tajweed_classifier.py is a script that takes Tanzil.net "Text (with aya numbers)"-style input via STDIN, and produces the tajweed JSON file (as described above) via STDOUT. It reads the decision trees from rule_trees/*.json. Note that the trees have been built to function best with the Madani text; they rely on the prescence of pronunciation markers (e.g. maddah) that may not be present in other texts.

Ruleset reference

The following are renderings of the decision trees used to determine where each tajweed annotation starts and stops. Attributes are grouped by the letters they belong to, a letter being defined as a base character (e.g. ل) plus any diacritics that follow (codepoints in the Mn category). Superscript/dagger alif is counted as a base character. The numbers prefixing each attribute indicate which letter the attribute belongs to: negative referring to previous letters, positive to future letters. Attributes starting with 0_... refer to the exact character being considered. Annotations do not always start or stop on letter boundaries. Refer to tajweed_classifier.py for the definition of each attribute.

ghunnah

Start

ghunnah start decision tree

End

ghunnah end decision tree

hamzat_wasl

Start

hamzat_wasl start decision tree

End

hamzat_wasl end decision tree

idghaam_ghunnah

Start

idghaam_ghunnah start decision tree

End

idghaam_ghunnah end decision tree

idghaam_mutajanisayn

Start

idghaam_mutajanisayn start decision tree

End

idghaam_mutajanisayn end decision tree

idghaam_mutaqaribayn

Start

idghaam_mutaqaribayn start decision tree

End

idghaam_mutaqaribayn end decision tree

idghaam_no_ghunnah

Start

idghaam_no_ghunnah start decision tree

End

idghaam_no_ghunnah end decision tree

idghaam_shafawi

Start

idghaam_shafawi start decision tree

End

idghaam_shafawi end decision tree

ikhfa

Start

ikhfa start decision tree

End

ikhfa end decision tree

ikhfa_shafawi

Start

ikhfa_shafawi start decision tree

End

ikhfa_shafawi end decision tree

iqlab

Start

iqlab start decision tree

End

iqlab end decision tree

lam_shamsiyyah

Start

lam_shamsiyyah start decision tree

End

lam_shamsiyyah end decision tree

madd_2

Start

madd_2 start decision tree

End

madd_2 end decision tree

madd_246

Start

madd_246 start decision tree

End

madd_246 end decision tree

madd_6

Start

madd_6 start decision tree

End

madd_6 end decision tree

madd_munfasil

Start

madd_munfasil start decision tree

End

madd_munfasil end decision tree

madd_muttasil

Start

madd_muttasil start decision tree

End

madd_muttasil end decision tree

qalqalah

Start

qalqalah start decision tree

End

qalqalah end decision tree

silent

Start

silent start decision tree

End

silent end decision tree