Versioned Sanskrit linguistic data.
The data has been cobbled together from a variety of sources. Together, the data covers almost all lexical forms in Classical Sanskrit literature.
git clone https://github.com/sanskrit/data.git && cd data
python bin/make_data.py
ls all-data
The data comes from several sources, each with its own format. make_data.py
converts all of the data to a common format and stores the results in the
all-data
directory. This is the data that downstream systems should use.
Verbs, participles, nouns, adjectives, pronouns, indeclinables, morphemes, and sandhi rules. If it's a Sanskrit word, it's probably here.
Each of the data sources used has its own license. Check the LICENSE files in
learnsanskrit.org
, sanskrit-heritage-site
, and monier-williams
for
details.
All Sanskrit strings are written in SLP1, mainly because it is extremely convenient when processing Sanskrit programmatically. You can transliterate this data to some other representation by using a variety of transliterators.