funderburkjim/elispsanskrit

root-class-pada list from SanskritVerb

Opened this issue · 5 comments

The pysanskrit conjugation algorithms for roots take as inputs the root spelling (following conventions of Monier-Williams dictionary), and (at least for the present system conjugational tenses of present, imperfect, imperative, and optative) the conjugation class and the 'voice' (Atmanepada or Parasmaipada for active voices).

Thus a comparison with the SanskritVerb system will be aided by developing a correspondence between root-class-pada lists for the two systems.

This issue pertains to such a list for the SanskritVerb system.

The primary dhatupatha proxy used by SanskritVerb has been extracted as verbdata.txt (reference).
Each record of verbdata has among its fields

  • verbwithoutanubandha - We use this as first approximation for MW root spelling
  • gana - the conjugation class from the Dhatupathas
  • pada (values of 'pa', 'A', or 'u' (for 'Parasmaipada', 'Atmanepada', either P or A (ubhayapada))

So, we develop a listing which presents each root spelling along with an enumeration of the
gana-pada combinations (note we convert 'u' to two cases 'A' and 'P'). The result is the
file sanverb_cp.txt which has 1684 distinct root records.

The generated forms of SanskritVerb is another way to get such a list from the SanskritVerb system.

Specifically, there is already derived a list of present-tense conjugations conj_pre.txt from SanskritVerb. A typical record is:

aci! pre 01.1000A:[aYcate aYcete aYcante aYcase aYceTe aYcaDve aYce aYcAvahe aYcAmahe]

Here the class is 01, the pada is A, and the root with anubandha is aci!. We need the root spelling without anubandha, but this is given as 'aYc' by a record of verbdata:

aci!:gatO yAcane ca ityepare:aYc:01:1000:u:sew:अ॑चिँ॑::559::aFc3_aciz_BvAxiH+gawO_yAcane_ca:

When we carry out this process, we get the list conjtab_cp.txt of root-class-pada records. This list has 1655 distinct root records.

I don't know the exact process by which SanskritVerb generated the present-tense forms, which are collected in the conj_pre file. However, there are clearly some differences, whether intentional or not, in the two lists of root-class-pada. A more detailed comparison of these two lists is presented in
sanverb_conjtab_cp.txt.
This shows the comparison results in three categories:

  • 1581 roots with the same class-pada in the two lists
  • 74 roots where verbdata has some class-pada in addition to that of conj_pre
  • 29 cases of roots which appear only in verbdata

Both of the last two cases could be characterized by saying that there are some missing present tense conjugations in conj_pre. However, it is not known at this time whether these missing conjugations are intentional (due to some details of the SanskritVerb conjugation algorithm) or inadvertent (due to some small error in the creation of the generated forms of the present tense).

Within the generation of sanverb_cp, I made note of 7 duplicates in verbdata of the form:

rootwithanubandha+gana+number+pada

The details of these are in sanverb_cp_log.txt.

I don't know the exact process by which SanskritVerb generated the present-tense forms, which are collected in the conj_pre file.

@funderburkjim
The process is -
The verb numbers e.g. 01.0001 was scraped from $verbdata and stored in LIST1 of https://github.com/drdhaval2785/SanskritVerb/blob/master/wrongformfinder.sh.
Then this shell script was run on all the list members and generated forms were collected in XML file.
There are two places where the script faced problem.

  1. Verbs with upasargas like Akrand, ASAs etc.
  2. Verbs which had different padas in the same gaNa.
    Majority of these were deleted from wrongformfinder.sh manually, because of some generation issues. The first set is verbs which have been enumerated in dhAtupATha with some prefixes. 'krand' is a verb. 'Akrand' is also a verb in some dhAtupATha. Such entries were deleted.
    Second set gave difficulty in derivation. Mostly a correctable error in the script, but I was too lazy to do so. And the solution was computationally expensive as regards the time consumed by script. For bulk generation, this slowed down the process too much. So I decided to drop these entries.

Maybe for the second kind of verbs, a limited generation for those verbs is a possibility, which can supplement the generatedforms.xml.

You are too clever to hoodwink. :)
Bringing forth my dirty laundry !!!

Bringing forth my dirty laundry

Yes, Dhaval, one can't hide from Jim. Your lazy. But what to say about other Indians in that case? You're like a factory. Too bad that lost in the MSS. world for far too long.