sanskrit-lexicon/PWK

Malten corrections (German words/bot tags)

Closed this issue · 13 comments

@maltenth expressed an interest in resolving the spelling variations seen in the text of the <bot> tags in pwk.

Here is an 'hk' version of pw.
pw_hk.zip
(Note this is based on csl-orig/v02/pw/pw.txt at commit d847fe33dd4e2626ebf8869325e0bca452a5f20d)

bot frequency data

German word corrections.

See german directory.

German word corrections from non-italic text

Work done in issue101/german1 directory.

429 lines changed.

pre_change1_regular.txt, pre_change1_irregular.txt, and pre_change1_thomas.txt describe the changes.

change_word_regular.txt, change_word_irregular.txt, and change_word_thomas.txt show the exact changes made.

Here 'regular' just refers to the program generation of changes from pre-changes.

All the regular and irregular changes were developed by me by reference to the scans.

what remains to do?

  1. bot-tag corrections - hopefully Thomas will now focus on this. When bot tag corrections finished, this issue can be closed.
  2. italic text markup - I think there are numerous errors in italic markup. This is believed to be a difficult task. It should be tackled in a new issue, after the bot-tag corrections.

Have heard from @maltenth -- He intends to work on the botanical names in pwk.

Nice to hear this, @funderburkjim !

Incidentally, it may be of some interest that while MW has just 45 unique instances incl. the Botanist's name(s), pwk has more than 4 times of that (~190).

More interesting is that PWG many a times has only the Botanist's name(s) separately, with a different Sanskrit name of the plant/tree having the same Sc. name, in contrast to the current entry name.

And should we have the Botanist's name(s) also expanded somewhere?
It is seen that over 50 such names occur in just these three works (PWG, pwk and MW).

Another observation:

While MW has >80% of <bot>-entries with 2nd (or later) words with Cap. letter, the pwk has just <30% of such.
[Interestingly, PWG has this ratio close to (just above) 'half'.]

So the Capitalisation being made THE 'norm' in the Sc. names across the CDSL works "stands" debatable!!

And should we have the Botanist's name(s) also expanded somewhere?

Makes sense.

Capitalisation being made THE 'norm' in the Sc. names across the CDSL works "stands" debatable

Agree @Andhrabharati

Incidentally, it may be of some interest that while MW has just 45 unique instances incl. the Botanist's name(s), pwk has more than 4 times of that (~190).

See what G.J. Meulenbeld says reg. this--

image

Here is the quick summary of counts--

image

In pwk, I had opted NOT to have the Capitalisation.

My work on MW markup has helped a lot in marking (or changing from bot to zoo) the additional <zoo words (these could not be marked so earlier, as they were not clearly identifiable in pwk 'as-is').

BTW, I've found some more 'grouped' entries in pwk now.

An interesting observation:

While both MW and pwk have almost the same count of unique <zoo entities, pwk has nearly 200 more unique <bot entities than MW.

@Andhrabharati Am working with Thomas on the bot issue.
Will report here when the time appears right to do so.

Closing this issue, since I have been unable to reach @maltenth since 8/24/2024.
In April 2024 I did some work on the botanical names, and will present that work in another issue.