Paninian Generator

Question

Paninian Generator

Opened this issue 4 years ago · 8 comments

FYI - I have begun coding a Paninian generator. The goal is to implement the ashtadhyayi plus vartikas as needed.
As of now, a basic skeleton that handles some pada-sandhi rules has been committed. Over time, I hope to add more rules, and move the process backward, eventually going through the following steps.

Semantic tag input
Prakriti + Pratyaya selection
Prakriti + Pratyaya transformations
Anga Transformation
Samhita - intra pada
Samhita - inter pada

Take a look at the generator branch - the sandhi.yaml file encodes the sutras I have so far, and process_yaml.py turns them into executable code. prakriya.py is the skeleton execution engine.

Run cd sanskrit_parser/generator ; python test.py to try it out.

Answer 1 · 2020-10-03T19:08:23.000Z

I think @drdhaval2785 has implemented similar generators. See https://github.com/drdhaval2785/SanskritVerb which I believe now has the older Subanta generation repo merged in. It includes a sandhi generator as well. Should we look at leveraging it before reimplementing?

Answer 2 · 2020-10-04T07:38:13.000Z

Would be happy to help.

Answer 3 · 2020-10-05T16:38:48.000Z

Sure, we should.
@drdhaval2785 - I had looked at this, and I remember we'd discussed this briefly as well. Is this completely in PHP, or is there a python version available? I remember you mentioning that this is a linear application of sutras based on the SK order - do I recollect it right?
What would be the best way to leverage this?

Answer 4 · 2020-10-06T01:01:18.000Z

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

Answer 5 · 2020-10-06T16:02:27.000Z

Thank you very much. It would be great if you could point to parts of your php that you think are best to reuse (I'm sure there are a lot). We can take up the conversion. The architecture I've tried to pick is classic Paninian, rather than SK based - so not a linear run of sutras.

…

On Mon, Oct 5, 2020 at 6:01 PM Dr. Dhaval Patel ***@***.***> wrote: This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#144 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKEWNQZFCTBFDJLZ4PVQXDSJJT6XANCNFSM4SDFV4ZQ> .

Answer 6 · 2021-01-03T02:15:00.000Z

Current status

YAML format for Sutras defined and parser implemented. This allows Sutras to be coded easily. This is way better than coding directly in Python, but I'm not 100% happy with the format yet
Implemented ~300 sutras.
Paninian Prakriya Engine implemented (with some current limitations, such as nitya/anitya tests)
Can generate prakriya for ajanta pum/strI/napum prAtipadikas.
Basic test suite added, with manual and pytest versions
- pytest suite takes too much memory while the manual version (same underlying code) takes very little.

Eventually, this will allow us to replace the INRIA/Sanskrit_data databases with our own pada generator. Also, it will allow us to solve the overgeneration problem in the sandhi splitter by validating output splits with this generator.

Answer 7 · 2021-01-03T02:19:20.000Z

$ time python ../../scripts/sanskrit_generator -t rAma -p jas --verbose
unable to import 'smart_open.gcs', disabling that module
INFO     Inputs [rAma, as]
INFO     rAma ['prAtipadika', 'pum']
INFO     as ['pratyaya', 'svAdi', 'sup', 'jas', 'suw', 'bahuvacana', 'praTamA', 'viBakti']
INFO     End Inputs

Prakriya
Input ['rAma', 'as']
Root
Prakriya Node
0 Prakriya Start ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
1 1.1.43 : suqanapuMsakasya  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
1.4.17 : svAdizvasarvanAmasTAne 
1.4.18 : yaci Bam 
1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam 
End
Child
Prakriya Node
2 1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam  ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
3 7.3.109: jasi ca  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.102: praTamayoH pUrvasavarRaH 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
4 6.1.102: praTamayoH pUrvasavarRaH  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
5 6.1.101: akaH savarRe dIrGaH  ['rAma', 'as'] 0-> ['rAmA', 's']
End
Child
Prakriya Node
6 6.1.105.1: dIrGAjjasi ca  ['rAmA', 's'] 0-> ['rAmA', 's']
End
Leaf Node
Final Output [['rAmA', 's']] = ['rAmAs']


Output: ['rAmAs']

real    0m10.504s
user    0m10.268s
sys     0m0.232s

Answer 8 · 2021-04-01T12:28:02.000Z

replace the INRIA/Sanskrit_data databases with our own pada generator

Have you seen P. Scharf's code? Based on it such picture can be generated: