Malten corrections (German words/bot tags)
Closed this issue · 13 comments
bot frequency data
- bot_freq_pw_0.txt frequency count for bot tags in pw
- bot_freq_mw.txt similar list for mw dictionary
- bot_freq_pw_0_withmw.txt frequency count for bot tags in pw, with corresponding counts in mw.
German word corrections.
See german directory.
- thomas changes the starting point
- change_pw_2.txt and change_pw_3.txt change files with annotations. 161 lines of pw.txt changed. These changes now installed in csl-orig repository (csl-orig/v02/pw/pw.txt).
- unchanged.txt 15 items from Thomas changes that did not generate changes to pw.txt. Have asked @maltenth to review.
German word corrections from non-italic text
Work done in issue101/german1 directory.
429 lines changed.
pre_change1_regular.txt, pre_change1_irregular.txt, and pre_change1_thomas.txt describe the changes.
change_word_regular.txt, change_word_irregular.txt, and change_word_thomas.txt show the exact changes made.
Here 'regular' just refers to the program generation of changes from pre-changes.
All the regular and irregular changes were developed by me by reference to the scans.
what remains to do?
- bot-tag corrections - hopefully Thomas will now focus on this. When bot tag corrections finished, this issue can be closed.
- italic text markup - I think there are numerous errors in italic markup. This is believed to be a difficult task. It should be tackled in a new issue, after the bot-tag corrections.
Have heard from @maltenth -- He intends to work on the botanical names in pwk.
Nice to hear this, @funderburkjim !
Incidentally, it may be of some interest that while MW has just 45 unique instances incl. the Botanist's name(s), pwk has more than 4 times of that (~190).
More interesting is that PWG many a times has only the Botanist's name(s) separately, with a different Sanskrit name of the plant/tree having the same Sc. name, in contrast to the current entry name.
And should we have the Botanist's name(s) also expanded somewhere?
It is seen that over 50 such names occur in just these three works (PWG, pwk and MW).
Another observation:
While MW has >80% of <bot>
-entries with 2nd (or later) words with Cap. letter, the pwk has just <30% of such.
[Interestingly, PWG has this ratio close to (just above) 'half'.]
So the Capitalisation being made THE 'norm' in the Sc. names across the CDSL works "stands" debatable!!
And should we have the Botanist's name(s) also expanded somewhere?
Makes sense.
Capitalisation being made THE 'norm' in the Sc. names across the CDSL works "stands" debatable
Agree @Andhrabharati
BTW, I've found some more 'grouped' entries in pwk now.
An interesting observation:
While both MW and pwk have almost the same count of unique <zoo entities, pwk has nearly 200 more unique <bot entities than MW.
@Andhrabharati Am working with Thomas on the bot issue.
Will report here when the time appears right to do so.
Closing this issue, since I have been unable to reach @maltenth since 8/24/2024.
In April 2024 I did some work on the botanical names, and will present that work in another issue.