The Quran is the official religious text of Islam. The one and only book for Muslims which has been read and memorized by thousands across the globe in the Arabic language.
The goal for this repository is to have the Quran in a readeable data format to apply a data science question.
In this repository, some words in this README.md
and variable names in the respective .ipynb
/.py
files will be in Arabic though written in English. Here's a glossary of those terms:
Surah
: Chapter (e.g. The Quran has a total of 114 chapters)Ayah
: VerseJuz
: Part (i.e. The Quran is divided into 30 parts)
Mock data. See data
folder for updated dataset
quran_dataset
├── data
│ ├── holy_quran.json
│ ├── surahs # CSV file for all 114 surahs
├── img
│ ├── quran_data_ch_1-2.png # for README.md
├── README.md
├── quran_df.ipynb # in-progress
├── quran_df.py # in-progress
├── quran_json_download.py
├── surah_df.ipynb
└── surah_df.py
- Create the following folders/subfolders locally
data
data/surahs
- Install the following libraries:
json
,numpy
,pandas
,requests
,scipy
- Run
quran_df.py
(Quran CSV file) and/orsurah_df.py
(Surah CSV file. If you want data for multiple surahs, simply run the file again)
The data compiled here (holy_quran.json
) is derived from Al Quran Cloud API.
The purpose and intent of this repository is to aid in the understanding of the Quran.
What do I mean by understanding? There are many data science tools/methods one can explore by using this data, NOT understanding the text itself, as the data compiled here has no related features.
If there's an error in the ayahs
column (e.g., missing a word, تشكيل/Tashkil (i.e. Arabic vowelization), ayahs being out of order, etc.), one should consult the Quran as that's the only correct source.
The data presented does not and will never precede the Quran.