funderburkjim/elispsanskrit

Generalize the comparision system (to accomodate other databases too)

Opened this issue · 2 comments

@funderburkjim
This is just a request.
Process it at your own time, if at all some time is available.

Right now you are comparing the data generated via MWvlex and SanskritVerb very methodically.
In the process of finding the differences and highlighting some peculiar patterns, you must have created a lot of code.
I want you to make these code files generic enough so that this can be extended to Gerard's or Amba's or anyone's database.

I mean to say, there should be a methodology in which I can write
comparedb('dhaval','amba') and get all the details we are right now scraping for SanskritVerb and MWvlex data analysis.

I know there would be some preprocessing needed, but that can easily be taken care of.

Once the basic parameters for comparision are finalized, preprocessing the databases to that format would be the only prerequisites before we can run analysers.

This way, a lot of differences / interesting patterns can be brought to forth for various available databases of verb forms.

To be very frank, when I compared output of my code against Gerard's and Amba's, all I did was to check whether the form is not seen in any of them. I didn't test verb,tense,person,pada wise analysis.
So it may be possible that the form may be seen in the Gerard or Amba database for some other tense. Then I would have missed such discrepancy. I was lazy and not so methodical as you are. So just took this rough method (maybe 90% time worked), but some 10-15% may have been missed also.

Now that you are spending so much time on one to one concurrences of verbs, I would also like to test one to one concurrences of verb forms.
Best wishes.

I mean to say, there should be a methodology in which I can write 
comparedb('dhaval','amba') and get all the details we are right now scraping for SanskritVerb and MWvlex data analysis.

Some dreams will have to remain dreams, Dhaval.