funderburkjim/elispsanskrit

duplicates in verbdata.txt

Opened this issue · 19 comments

In preparation for comparing the pysan conjugation algorithms with those of SanskritVerb, there is some analysis of the $verbdata element of
SanskritVerb program function.php.

This data is extracted as file verbdata.txt.

Analysis of verbdata was made for duplicates, in two ways.

In both cases, the significance (if any) of these duplicates is a question to me.

duplicate verb without anubandha

Each of the verbdata.txt records has the root spelled in two ways:

  • with anubandha(s)
  • without anubandha(s)

Refer to verbdata_dupnorm.txt.

It was found that in 19 cases, a given spelling-with-anubandha was associated with more than one spelling-without-anubandha.

The presence of these duplicates provides an obstacle to corresponding the records of verbdata.txt with roots indicated in dictionaries such as that of Monier-Williams.

duplicate sutra numbers

Refer to verbdata_dupsutra.txt.

Sutra number in this discussion means an amalgam of the gana (conjugation class) and (sequence) number of verbdata.txt records. (This file identifies the fields that comprise a record of verbdata.txt.)

There were 43 cases where a given sutra number appears in more than one record of verbdata.txt.

I presume that some set of parameters derived from the fields of verbdata.txt should identify a particular entity which we call a root. A priori, I expected that the gana-sequence number (sutra number) would be such a parameter, but the presence of duplicates shows that it is not. However, the
fact that the number of duplicates (43) is quite small (2% of 2213 records of verbdata.txt) indicates that the sutra number is almost an identifier.

On the other hand, the verb-with-anubandha is also not a unique identifier of the cases of verbdata.txt.

Is there a generally accepted dhatu identifier ? Is this identifier present in verbdata.txt?

It was found that in 19 cases, a given spelling-with-anubandha was associated with more than one spelling-without-anubandha.

Majority of them seem to be errors.
Will keep you posted when I correct these entries in function.php.
You can regenerate later on.

Is there a generally accepted dhatu identifier ?

From my experience, it would be gana,sutranumber,pada,iDAgama,meaning.
The reason behind the meaning coming in this is - there are places where different commentators assign a different meaning to the same verb. If we are not able to have a separate verb entry with separate verb number, it is possible that sutranumber is identical, but meaning is separate.

There can be genuine tagging errors of the database maker also. Need to examine these entries individually.

Is this identifier present in verbdata.txt?

Yes, the identifier seems to be present in verbdata.txt.

#32 (comment)
Corrections started.
Changes noted here.
For your reference, the base dhAtupATha which Mihail has based his numbers seem to be the following.
dhatupatha_svara.pdf
Majority of root numbers tally with this.

YimidA!:Bid,mid:Changed to mid. Bid was error.
kaWi!:utkaRW,kaRW:Removed ut. It was upasarga.
o!laqi!:olaRq,laRq:Removed o.
vella!:vell,vehl:Separate verbs vella! and vehla!. Correction to vehla!
pasi!:paMS,paMs:Separate verbs pasi! and paSi!. Correction to paSi!
barha!:barh,varh:Separate verbs barha! and varha!. Correction to varha!
bfhi!:bfMh,vfMhःSeparate verbs bfhi! and vfhi!. Correction to vfhi!
DUpa!:Dup,DUp:Tricky. There are two verbs on the same number. Alternate forms. See image. Right now changing to Dupa!. Will have to take a call.
capture
vehf!:beh,veh:Same as above. Changed to behf!
DU:Du,DUःSame as above. Changed to Du.
mana!:man,mAnःChanged to mAna!
IKi!:IK,INK:Changed to IKa!
pelf!:pall,pel:Changed to palla!
taqa!:taq,taRq:Changed to taq
vasa!:vas,vasa:Changed to vas
aqqa!:aqq,adq:Changed to adqa!
visa!:bis,vis:Changed to vis
bisa!:bis,vis:Changed to bis
mAna!:man,mAn:Changed to mAn

19 entries of 'duplicate verb without anubandha' is corrected in $verbdata now.
#32 (comment) pending.
@funderburkjim will you please regenerate the statistics after this first round of corrections?

I am sure, some increase will be seen in the 'duplicate sutra number' lot after the first round of corrections.

@drdhaval2785 Regenerated

  • verbdata.txt (from revised SanskritVerb/scripts/function.php)
  • verbdata_dupnorm.txt. Now has 2 cases, 17 removed
  • verbdata_dupsutra.txt. Same number (42). But some revision of data for these, per the verbdata changes.

dhatupatha_svara.pdf

  • The link to this is a new form to me for GitHub.
    https://github.com/funderburkjim/elispsanskrit/files/448794/dhatupatha_svara.pdf
    When I clicked, it downloaded the file.
    This must be some GitHub service. Is there a link on how to use the 'files' service?
  • Dhatupathas are generally associated with some scholar's name, as I understand it. For instance,
    there is the mADavIyaDAtupAWa, the Westergaard Dhatupatha, probably many other Sanskrit
    scholars both modern and from antiquity.
    To which scholar do we attribute dhatupatha_svara.pdf?

When I clicked, it downloaded the file.
This must be some GitHub service. Is there a link on how to use the
'files' service?

Drag and drop in the issue text box. Nothing further.

To which scholar do we attribute dhatupatha_svara.pdf?

I seriously do not know. It is available on sanskritdocuments.org I guess.
No metadata in the file.

To which scholar do we attribute dhatupatha_svara.pdf?

When I last met Mihas in Moscow he told me that there have been 3 sources. The main source is Katre (https://yadi.sk/i/4kO_OF81uhGer and https://yadi.sk/i/klN3jLERuhGh9) , the others two for reference I do not remember, but one could ask Mihas by mail. We are no more in contact as he is on Ukraine's side (being in Belarussia), I'm - Russia's.

So dhatupatha_svara.pdf was produced by 'Mihas' ?

Sad about the Ukraine issue.

Here are four cases that might need correction in verbdata; (from sanverb_cp_log.txt)

case 1 of duplicate verbdata key: vraRa!.01.0519.P
   vraRa!:SabdArTaH:vraR:01:0519:pa:sew:व्र॑णँ॑:277:290:293:vraN1_vraNaz_BvAxiH+SabxArWaH:
   vraRa!:SabdArTaH:vraR:01:0519:pa:sew:व्र॑णँ॑:277:290:293:vraN1_vraNaz_BvAxiH+SabxArWaH:
case 2 of duplicate verbdata key: kaWi!.10.0385.U
   kaWi!:Soke prAyeRotpUrva utkaRWAvacanaH:kaRW:10:0385:u:sew:क॑ठिँ॑:1362:1378:1415:kaNT2_kaTiz_curAxiH+Soke:
   kaWi!:Soke prAyeRotpUrva utkaRWAvacanaH:kaRW:10:0385:u:sew:क॑ठिँ॑:1362:1378:1415:kaNT2_kaTiz_curAxiH+Soke:
case 3 of duplicate verbdata key: DUpa!.10.0303.U
   DUpa!:Dupa!' BAzArTaH:Dup:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
   DUpa!:BAzArTaH:DUp:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
case 4 of duplicate verbdata key: granTa!.10.0362.U
   granTa!:banDane:granT:10:0362:u:sew:ग्र॑न्थँ॑:1342,1353:1368:1395,1406:granW3_granWaz_curAxiH+sanxarBe:261
   granTa!:sandarBe:granT:10:0362:u:sew:ग्र॑न्थँ॑:1342,1353:1368:1395,1406:granW3_gran```

vraRa! and kaWi!

Duplicates - removed one entry.

DUpa!

This is typical. There are two verbs in the same number. There are some such cases.
I propose to do it 10.0303a. @funderburkjim what is your take?
capture

granTa!

granTa! banDane is 10.0362.
granTa! sandarBe is 10.0375.
Corrected in function.php

Re: Drag and drop in the issue text box. Nothing further.

Thanks. Useful idea.

Regarding the DUpa! case, where given sutra has two root forms.

DUpa!:Dupa!' BAzArTaH:Dup:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
DUpa!:BAzArTaH:DUp:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:

Maybe change these to

Dupa!:BAzArTaH:Dup:10:0303:u:sew:धु॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
DUpa!:BAzArTaH:DUp:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:

I would hold off distinguishing these further by 10:0303a on one of them
since the 10.0303 is probably a key
into a printing of the dhatupatha, and adding an 'a' would confuse the construction of this key.
And there's also the fact that the 0303 is a sequence number.

Further comment/question re DUpa!

In looking at dhatupatha_svara.pdf, there are many cases like 10:0303, in the sense of having the form

gana.number root (root1)

10:0305  cIva! (cIba!)
cIva!:BAzArTaH:cIv:10:0305:u:sew:ची॑वँ॑:1321::1374:cIv2_cIvaz_curAxiH+BARArWaH:

01:0105  zvaska! (zvazka!)
zvaska!:gatyarTaH:svazk:01:0105:A:sew:ष्व॑स्कँ॒:::::

(Numerous other examples)

So, if DUpa! were handled in verbdata like those other two instances, then there would be only
ONE record for it in verbdata.

This is just an observation regarding some formal comparisons. I don't know the significance of all the pieces, so do not have a definite opinion

@funderburkjim,

It actually transpires that there are many such cases in dhatupatha_svara.pdf.
And not all of them were given a separate headword status e.g. there are no zvazka! or cIba! verbs in database.
So, best is to remove the Dup from database for consistency.

Regenerated the data.