Paninian Generator
Opened this issue · 8 comments
FYI - I have begun coding a Paninian generator. The goal is to implement the ashtadhyayi plus vartikas as needed.
As of now, a basic skeleton that handles some pada-sandhi rules has been committed. Over time, I hope to add more rules, and move the process backward, eventually going through the following steps.
- Semantic tag input
- Prakriti + Pratyaya selection
- Prakriti + Pratyaya transformations
- Anga Transformation
- Samhita - intra pada
- Samhita - inter pada
Take a look at the generator
branch - the sandhi.yaml
file encodes the sutras I have so far, and process_yaml.py
turns them into executable code. prakriya.py
is the skeleton execution engine.
Run cd sanskrit_parser/generator ; python test.py
to try it out.
I think @drdhaval2785 has implemented similar generators. See https://github.com/drdhaval2785/SanskritVerb which I believe now has the older Subanta generation repo merged in. It includes a sandhi generator as well. Should we look at leveraging it before reimplementing?
Would be happy to help.
Sure, we should.
@drdhaval2785 - I had looked at this, and I remember we'd discussed this briefly as well. Is this completely in PHP, or is there a python version available? I remember you mentioning that this is a linear application of sutras based on the SK order - do I recollect it right?
What would be the best way to leverage this?
This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.
Current status
- YAML format for Sutras defined and parser implemented. This allows Sutras to be coded easily. This is way better than coding directly in Python, but I'm not 100% happy with the format yet
- Implemented ~300 sutras.
- Paninian Prakriya Engine implemented (with some current limitations, such as nitya/anitya tests)
- Can generate prakriya for ajanta pum/strI/napum prAtipadikas.
- Basic test suite added, with manual and pytest versions
- pytest suite takes too much memory while the manual version (same underlying code) takes very little.
Eventually, this will allow us to replace the INRIA/Sanskrit_data databases with our own pada generator. Also, it will allow us to solve the overgeneration problem in the sandhi splitter by validating output splits with this generator.
$ time python ../../scripts/sanskrit_generator -t rAma -p jas --verbose
unable to import 'smart_open.gcs', disabling that module
INFO Inputs [rAma, as]
INFO rAma ['prAtipadika', 'pum']
INFO as ['pratyaya', 'svAdi', 'sup', 'jas', 'suw', 'bahuvacana', 'praTamA', 'viBakti']
INFO End Inputs
Prakriya
Input ['rAma', 'as']
Root
Prakriya Node
0 Prakriya Start ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
1 1.1.43 : suqanapuMsakasya ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
1.4.17 : svAdizvasarvanAmasTAne
1.4.18 : yaci Bam
1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam
End
Child
Prakriya Node
2 1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
3 7.3.109: jasi ca ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe
6.1.102: praTamayoH pUrvasavarRaH
6.1.101: akaH savarRe dIrGaH
End
Child
Prakriya Node
4 6.1.102: praTamayoH pUrvasavarRaH ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe
6.1.101: akaH savarRe dIrGaH
End
Child
Prakriya Node
5 6.1.101: akaH savarRe dIrGaH ['rAma', 'as'] 0-> ['rAmA', 's']
End
Child
Prakriya Node
6 6.1.105.1: dIrGAjjasi ca ['rAmA', 's'] 0-> ['rAmA', 's']
End
Leaf Node
Final Output [['rAmA', 's']] = ['rAmAs']
Output: ['rAmAs']
real 0m10.504s
user 0m10.268s
sys 0m0.232s