PWKVN
Closed this issue · 52 comments
This documents the digitization of the Nächtrage und Verbesserungen sections of PWK.
The digitization was recently prepared by Thomas Malten and his typists in India.
In turn much of the 'typical' markup was added by me, and the result prepared as a 'NEW DICTIONARY'.
Currently, there is no 'application' of the additions and corrections to the PW dictionary itself.
A specialized display allows one to investigate pwkvn along with Schmidt (sch) dictionary and pw dictionary.
preparation
The derivation of the current digitization proceeded in many steps, which are in the pwkvn folder.
There are numerous (28) forms present in the pwkvn/orig folder; the readme file therein briefly describes the work done, starting with the digitization prepared by @thomasincambodia.
The work in this folder is preparatory, and is included for possible reference.
The production form of the digitization is in the csl-orig repository
digitization
The base form of the dictionary is pwkvn.txt in csl-orig/v02/pwkvn.
Several derivative files and forms are constructed by program.
These are all constructed by the redo.sh script in the csl-orig/v02/pwkvn/update/ folder:
- pwkvn_hwextra.txt Many 'entries' in pwkvn were identified as having one or more alternate headwords;
these are identified in the digitization by the markup<althws>{#X, Y, ...#}</althws>
. - update/pwkvn_hk.txt utf-8 encoding, and retaining most of Thomas's original coding conventions, e.g.
- devanagari in hk transliteration, with particular convention for accents.
- Letter-number convention (AS) for letters with diacritics
- line-break indicated by superscript 2.
- update/pwkvn_hk_ansi.txt Same as pwkvn_hk.txt, but in cp1252 encoding used by Thomas for
compatibility with Kedit editor. - update/pwkvn_deva.txt Devanagri text in Unicode Devanagari (for @Andhrabharati , @drdhaval2785 and
others.
Most of these files are to big to view in browser, but may be downloaded individually from Github.
Displays
- the B L A M displays are functioning; links found at
https://sanskrit-lexicon.uni-koeln.de/scans/PWKVNScan/2020/web/index.php - A special display has been prepared that shows PWKVN along with SCH and PW dictionaries.
https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/- This is technically interesting, because it builds on web-component technology experimentation
of two years ago (see specifically lit-getword05a using LitElement (which I think is now a Google project at https://lit.dev/. - this link now provides a selection of various display versions.
After a review period for user comments, links will be made in the homepage https://www.sanskrit-lexicon.uni-koeln.de/.
- This is technically interesting, because it builds on web-component technology experimentation
A major chunk of work done at last, though some more related work remains on this.
I can post the details, if Jim is interested in this continuation part.
@Andhrabharati Am interested to see your comments. Please go ahead and post the details.
@funderburkjim, I think you said you have a newer version of sch in ansi format. Please let me have it.
No 'ansi' version exists of the current digitization of sch.
The current version (non-ansi) may be downloaded from https://github.com/sanskrit-lexicon/csl-orig/blob/master/v02/sch/sch.txt.
Creating an 'ansi version' seems non-trivial; I'll let you know if it becomes available.
Currently, there is no 'application' of the additions and corrections to the PW dictionary itself.
Is there a plan for it?
Guess it should go to the pw.txt appended at the end, with continuing L-numbers, if not at the end of each volume (resp. portions) as in pwg.txt.
pw.txt appended at the end, with continuing L-numbers
@funderburkjim agree?
Abbreviations from Nachtrage are not recognized in Schmidt.
sch.txt does not have ls markup. That's the reason for Apast. Sr.
@thomasincambodia has requested that the above display also include PWG, which I plan to try.
The display also needs an 'info.html' file to explain what's going on.
pw.txt appended at the end, with continuing L-numbers
That's one possibility. Low ranking on things to do.
Currently, there is no 'application' of the additions and corrections to the PW dictionary itself.
Is there a plan for it?
A can of worms I'm leary of opening. Greater interest in continuing the improvement of ls markup in PW.
@funderburkjim
can you use the pwkvn_hk_ansi.txt which you sent me and
which I have been working on to improve the markup?
Yes - I can convert pwkvn_hk_ansi.txt back to pwkvn.txt
It will be problematic if you change the number of lines in the file, add new markup, etc.
Best to send me a version before you do a lot, so I can see what 'improve the markup' involves.
@thomasincambodia has requested that the above display also include PWG, which I plan to try.
I wish I could see the form as PWG | PWG_VN | pwk | pwk_VN & SCH, or in other words,
(1) PWG main text, as first column,
(2) PWG VN data (sequentially volume-wise, as present in the print), if available, as second column
(3) pwk text, as third column
(4) pwk_VN data (sequentially volume-wise, as present in the print), if available, followed by SCH data, as fourth column.
[Presently the PWG_VN is shown beneath the main text in PWG; SCH is shown above & the pwk_VN below (non-chronological) in PWKVN.]
Having the VN (Annexure/Corrections) besides the main data (instead of beneath) makes it more visible; and having PWG & pwk side-by-side shows how the Petersburger lexicons evolved over time.
Having the VN (Annexure/Corrections) besides the main data (instead of beneath) makes it more visible
And how about extending this to all cdsl works (barring MW99, which got both of them integrated; of course some work is still pending in it!!)?
@Andhrabharati Am interested to see your comments. Please go ahead and post the details.
The very first point: Accents--
- The accents in the pwkvn are to be rendered just as in PWG and pwk texts; recall the prolonged discussion and exercise done during last year about the devanagari accent marking specifically applicable to PWG and pwk.
Once this is done, I would start listing the other points.
[I feel, listing all points at once will not attract your attention.]
The present display at https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/ does not show the accents at all, and I see no option to enable the same.
And thus, one misses the chance to see that many accents in pwkvn are lost (or skipped) in SCH.
These accents might be of no interest/significance to Jim (as he had mentioned sometime ago elsewhere), but they would be required by the 'real serious users' of the Petersburger lexicons.
many accents in pwkvn are lost (or skipped) in SCH.
Oh, interesting @Andhrabharati
The display also needs an 'info.html' file to explain what's going on.
Would be lovely, as a lot of work has been done, which I'm hardly aware of.
@thomasincambodia has requested that the above display also include PWG, which I plan to try.
PWG, but not the PWG Nachtrage?
Greater interest in continuing the improvement of ls markup in PW.
You made me smile.
sch.txt does not have ls markup. That's the reason for Apast. Sr.
Can it have same as PWK at least for now?
I think the PWG Nachträge are completely absorbed in PWK (needs to be checked), so adding them may be only of historical interest. With the the complete digitization of PWKVN 1 to 7 (+8 Last additions pp. 384-390) in hand at last, -- thanks to Jim's generosity --, even SCH is, at this point, only of relevance to PWKVN in that it sometimes silently corrects printing errors in PWKVN.
So it is a moot question whether the work focus should be on SCH.
above you write:
"These accents might be of no interest/significance to Jim (as he had mentioned sometime ago elsewhere), but they would be required by the 'the real serious users' [emphasis yours] of the Petersburger lexicons."
Please be more specific, and give the actual citation of what Jim writes about accents; let me know whom you consider as "real serious users" and also whom you consider as not real serious users
above you write:
I feel, listing all points at once will not attract your attention. [emphasis yours]
Please clarify. Why do you feel that? Like Jim might be overwhelmed by the complexity of your list?
above you write:
And thus, one misses the chance to see that many accents in pwkvn are lost (or skipped) in SCH.
That is certainly true, and a study of this and other errors in Schmidt's handling of PWKVN can certainly be profitably made.
Note that (roughly) only half the entries in SCH refer to PWK.
@gasyoun & @thomasincambodia
PWG, but not the PWG Nachtrage?
I think the PWG Nachträge are completely absorbed in PWK (needs to be checked), so adding them may be only of historical interest.
Even otherwise, the PWG Nachtrage data is already there in the cologne PWG file (originally- sanskrit-lexicon/PWG#37 (comment)), and is being shown beneath the main text (as applicable, for some portion- if not of all the volumes). It is only the pwk_VN data that was skipped during pwk digitization, and is done now; hence the debate on how to show/use the same.
'-------------------
@thomasincambodia
"These accents might be of no interest/significance to Jim (as he had mentioned sometime ago elsewhere), but they would be required by the 'the real serious users' [emphasis yours] of the Petersburger lexicons."
Please be more specific, and give the actual citation of what Jim writes about accents; let me know whom you consider as "real serious users"
This is what Jim said at sanskrit-lexicon/PWG#5 (comment)
But, speaking personally, Devanagari accents have basically no utility to me. If I still have not mastered the vocabulary of even Hitopadesha, why worry about Devanagari or Vedic accents?
But I realize others may not view accents this way, and am willing to make changes to the display details to accomodate other views. I would like others to come to a consensus before proceeding with technical changes to slp1_deva.xml or elsewhere.
Probably Thomas could spend a little time, going through the chain of posts/discussions from Aug '21 to Oct '21- starting at sanskrit-lexicon/PWG#5 (comment) to the end of the issue.
I consider myself as a serious user and guess there would at least be some more across the globe; and I do not want to speak on the other category users.
I am allured to PWG for some unknown reason (probably, it being in a different language than all others that I had dealt with so far; and also as it has many citations and has been the "base" for almost all the later Sanskrit dictionaries).
sanskrit-lexicon/AP90#17 (comment)
sanskrit-lexicon/AP90#17 (comment)
@thomasincambodia
I feel, listing all points at once will not attract your attention. [emphasis yours]
Please clarify. Why do you feel that? Like Jim might be overwhelmed by the complexity of your list?
Yes, Jim himself has mentioned thus sometime back, @thomasincambodia !
He seems to have accustomed to see one point at one issue heading, and my way of posting a chain of points without gap/break seems to have made him 'skip' (most, if not all, of) them.
Of course, he had made separate issues (just about 2-3 points so far) out of my bunch of points, but that's a rarity.
And thus, one misses the chance to see that many accents in pwkvn are lost (or skipped) in SCH.
That is certainly true, and a study of this and other errors in Schmidt's handling of PWKVN can certainly be profitably made. Note that (roughly) only half the entries in SCH refer to PWK.
I had done a reasonably good amount of study of pwk_VN pages and the SCH, and made some notes to myself in last December itself, before suggesting to go for full typing of pwkVN pages all over, instead of trying to derive them from SCH matter.
#75 (comment)
[Though I initially had an intention to post my observations at that time, some later happenings have changed my mind.]
[Though I initially had an intention to post my observations at that time, some later happenings have changed my mind.]
Are you ready now?
Accents in the pwkvn display.
The display has been altered so that accents are shown. As with PW and PWG, the PWKVN display for Devanagari uses
- slp1_deva1.xml for transcoding
- siddhanta1 font.
- so the udAtta accent shows as a superscript Devanagari 'u'.
- Currently, there is no way to view the display without accents.
A sample word with accents is dvimUrDan (slp1)
In Advanced view, I could see without accents. No issues at all.
It's the https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/ display that does not currently have a control for
accent/noaccent.
Ok. Thanks for clarification.
Accent control added. to csl-apidev/pwkvn' display.
Initial value is 'show accent', but may be changed to 'hide accent'.
Second point- Text portion
2. The text portion after the meta-line in all other dictionaries is-- HW¦ Body, including in the recently added ARMH. But in this work, the broken bar is missing. Though it is not a major issue to worry about, if and when the data is clubbed with the pwk main text, it would definitely look odd. Also, this forms a base for the next point.
Third point: AltHWs
3a. Out of the 387 listed <althws>
in the first 6 volumes, most (if not all) of them are NOT alt. HWs, but just the HW and its body content (as addition or as revision).
3b. Out of the 1212 listed <althws>
in the 7th volume, vast majority of them are NOT alt. HWs, but just sequential entries in the earlier (1-6) volumes. This portion in the 7th volume being mainly intended as the index for the previous volumes' VN entries, such sequential entries of a volume are clubbed together to save the print space (pages).
Hope these example screens make the point clear enough.
I suggest that necessary action may be taken to correct the above, and retain just the actual alt. HWs. As this involves deleting some lines in the first 6 volumes portion and adding some lines/entries (by appropriately splitting) in the volume 7 part, leaving this to Jim to do (he wants the line count of a submitted file to be matching as a foremost requisite; so my help may not be entertained).
Incidentally, the vol.7 index entries have some corrections (either in accent or in spelling) in the index words as against the actual entries of the first 6 volumes. So we cannot ignore those index entries altogether.
Though not so important, just like to mention that few earlier volume VN entries are missed in the vol.7 index; could it be an error or intentional by Boethlingk?
However I can prepare a list, to save Jim's time in going through all these entries to identify which to retain and which to change (if he is convinced).
most (if not all) of them are NOT alt. HWs, but just the HW and its body content (as addition or as revision).
Good point indeed.
However I can prepare a list, to save Jim's time in going through all these entries to identify which to retain and which to change (if he is convinced).
Is @funderburkjim convinced?
Hope these example screens make the point clear enough.
Indeed @Andhrabharati, thanks.
Incidentally, the vol.7 index entries have some corrections (either in accent or in spelling) in the index words as against the actual entries of the first 6 volumes. So we cannot ignore those index entries altogether.
Oh no, it's where one can go mad ))
Though not so important, just like to mention that few earlier volume VN entries are missed in the vol.7 index; could it be an error or intentional by Boethlingk?
intentional by Boethlingk - do not think so. He wrote about Knauer who compared PWK and PWG and found some missing entries, so he was not aware of the loss before Knauer's findings himself.
Incidentally, the vol.7 index entries have some corrections (either in accent or in spelling) in the index words as against the actual entries of the first 6 volumes. So we cannot ignore those index entries altogether.
Oh no, it's where one can go mad ))
Why so? Recall the same condition identified in MW99 annexure, and subsequent integration work done in Jan-Feb 2021.
Still Jim and I are quite sane, not gone mad!!
He wrote about Knauer who compared PWK and PWG and found some missing entries, so he was not aware of the loss before Knauer's findings himself.
You still owe me giving the Boehtlingk's letters, @gasyoun ! are the two volumes not scanned still?
Knauer who compared PWK and PWG and found some missing entries,
With the display that I was suggesting earlier above, #86 (comment), any and everyone can clearly see such differences between PWG and pwk.
volume 7 hw lists and <althws>
I introduced the althws markup in pwkvn.txt as a way to deal with the headword lists in volume 7.
In the first entry of volume 7, in pwkvn.txt we have
<L>9410<pc>7-289-a<k1>a<k2>a
<althws>{#aMSa#}</althws>
<hom>2.</hom> <hw>{#a#}</hw> und <hw>{#aMSa#}</hw> I. 1.
<LEND>
This entry needs to be accessible to displays either under 'a' or 'aMSa' (slp1).
The althws tag causes a 'duplicate' record to be made in pwkvn.xml (with L=9410.1).
<H1><h><key1>a</key1><key2>a</key2></h><body>
<hom>2.</hom> <s>a</s> und <s>aMSa</s> I. 1. </body>
<tail><L>9410</L><pc>7-289-a</pc></tail></H1>
<H1><h><key1>aMSa</key1><key2>aMSa</key2></h><body>
<alt><s>aMSa</s> is an alternate of <s>a</s>.</alt> <hom>2.</hom> <s>a</s> und <s>aMSa</s> I. 1. </body>
<tail><L>9410.1</L><pc>7-289-a</pc><hwtype n="alt" ref="9410"/></tail></H1>
Since the displays are built on pwkvn.xml, the 9410.1 record shows the entry under a search for 'aMSa'.
pwkvn.xml,
I think this althws idea is a reasonable way to handle the headword lists in volume 7.
The only general change that might be needed is in the phrase
Y is an alternate of X
Perhaps this should be either changed or omitted entirely.
volume 1-6 althws
Before adding the <althws>
markup, I added the <hw>X</hw>
markup in pwkvn.txt.
This was done on the basis of patterns. the simplest being {#X#} und {#Y#}
which was marked
as <hw>{#X#}</hw> und <hw>{#Y#}</hw>
. For example in volume 1:
<L>115<pc>1-283-a<k1>agfhapati<k2>agfhapati
<althws>{#agfhapatika#}</althws>
<hw>{#*agfhapati#}</hw> und <hw>{#*°ka#}</hw> <is>gaṇa</is> {#cArvAdi#}.
<LEND>
In this case, I thought the entry should be accessible by either agfhapati
or agfhapatika
, hence the althws markup.
But, I used several other patterns, and perhaps some of these should be considered wrongly marked.
For example, maybe the very first entry should be considered as having only one headword <hw>a</hw>
and
no althws.
suggestion
Why don't you first prepare a file with the entries that should have NO althws at all (i.e. there is an althws markup but
there should be only one <hw>X</hw>
which is the first one.
The only thing needed in this file is the L-numbers (for instance, L=2).
Then I can easily remove the althws markup (and the extra hw-tags for these cases from pwkvn.txt.
You still owe me giving the Boehtlingk's letters, @gasyoun ! are the two volumes not scanned still?
One volume, but soo big. The compiler died a year ago. I'll make paper one day and maybe will scan it for that.
sch.txt does not have ls markup. That's the reason for Apast. Sr.
Can it have same as PWK at least for now?
@gasyoun Someone needs to
- add the basic ls markup in sch.txt
- create the corresponding 'schauth/tooltip.txt' file (such as based on the front-matter list given for sch).
- format could be like that for benfey auth tooltips
I wish I could see the form as PWG | PWG_VN ...
Currently, the pwg digitization contains both the main text and the VN material.
The VN material seems to be in two sections:
- end of volume 5: VERBESSERUNGEN UND NACHTRÄGE ZU THEIL I-V.
- extends from line 617185 of pwg.txt to line 737375 of pwg.txt
- From
<L>62404<pc>5-0941<k1>a<k2>a<h>3
to<L>80800<pc>5-1678<k1>mluc<k2>mluc
- end of volume 7: Verbesserungen und Nachträge zum ganzen Werke.
- extends from line 1122155 through line 1149413 of pwg.txt
- From
<L>117929<pc>7-1685<k1>a<k2>a<h>3
to<L>122732<pc>7-1822<k1>hevAkin<k2>hevAkin
To do what you suggest, we would need at least to make a separate 'pwgvn' digitization (by extracting the two VN sections from pwg), and then doing the necessary things to have a display for a new 'pwgvn' dictionary.
An interesting idea, but I don't want to commit to doing it now. Will focus first on adding the current pwg to display.
The VN material seems to be in two sections:
* end of volume 5: VERBESSERUNGEN UND NACHTRÄGE ZU THEIL I-V. * extends from line 617185 of pwg.txt to line 737375 of pwg.txt * From `<L>62404<pc>5-0941<k1>a<k2>a<h>3` to `<L>80800<pc>5-1678<k1>mluc<k2>mluc` * end of volume 7: Verbesserungen und Nachträge zum ganzen Werke. * extends from line 1122155 through line 1149413 of pwg.txt * From `<L>117929<pc>7-1685<k1>a<k2>a<h>3` to `<L>122732<pc>7-1822<k1>hevAkin<k2>hevAkin`
Pl. recall the discussion at sanskrit-lexicon/PWG#39; the material belonging to Vol.1-4 and Vol.6 does not always get reflect in the VN matter of Vol.5 or Vol.7; many cases where those entries are seen to be repeated, they are further 'revised' in Vol.5/7. [You also have stored my posted file (for future use ?)-- https://github.com/sanskrit-lexicon/PWG/issues/39#issuecomment-887932380]
As such, I propose that you should include all the VN material from all the volumes, as is done in case of pwk.
In pwg.txt, I don't see any entries that should be considered as VN entries except those mentioned above.
Why don't you first prepare a file with the entries that should have NO althws at all (i.e. there is an althws markup but there should be only one
<hw>X</hw>
which is the first one.
The only thing needed in this file is the L-numbers (for instance, L=2).
As I started making the list, seen that the very first entry is marked as <L>2,
instead of <L>1
.
On further looking, noticed that at every <H>
there is a jump in <L>
number; in total 9 of them are skipped--
<L>1
<L>1769
<L>3238
<L>4954
<L>5974
<L>8175
<L>9408
<L>9409
<L>22069
Looked at Part-1 (Vol. 1-6) of the pwkvn and made the list, taking the pwk main data as the reference to decide the alt. HWs.
non-althws entries (Vol. 1-6).txt
[As the count is not even 40% (of 387), my estimation of "most (if not all)" is clearly wrong, but the count is quite large indeed.]
If this is found suitable, shall look into the Vol. 7 data (Part-2) next.
To do what you suggest, we would need at least to make a separate 'pwgvn' digitization (by extracting the two VN sections from pwg), and then doing the necessary things to have a display for a new 'pwgvn' dictionary.
An interesting idea, but I don't want to commit to doing it now. Will focus first on adding the current pwg to display.
JIC you happen to change your mind @funderburkjim , here are the split portions of pwg.txt (latest from the csl-orig repo)--
pwg_main.zip
pwg_VN.zip
And you might consider making the present format text of the Vol. 1-4 & 6 VN data, from the first 1057 lines from pwg_orig.txt in the https://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/downloads/pwgtxt.zip, and include either to the pwg.txt as is, or to the split portion of pwg_VN txt, to use it in displaying the PWM text.
experimental versions
There are now 2 versions of the experimental display which include pwg.
The link https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/ provides further links to the variants.
Comments solicited.
(plan to soon revise the list of 'althws')
This screen shows the point I had mentioned earlier, that Vol.7 entries often give the accent corrections in other volumes' VN entries.
Also I would like to show what did in SCH, before deciding that many of its entries differ (lost/skipped) in accent from pwkvn--
That is, I had made Devanagari strings (with accents of pw style) of the IAST string of SCH, to enable clear visible comparison.
As seen at cologne texts, only MW99 seems to have such additional work (marking all transliterated Skt. words so as to show them in Devanagari script).
Probably CDSL might also consider doing this across all works showing Skt. words just in IAST, like BUR etc.
Probably CDSL might also consider doing this across all works showing Skt. words just in IAST, like BUR etc.
@Andhrabharati If you think this idea should be pursued further, please open a new issue in the cologne repository: https://github.com/sanskrit-lexicon/cologne/issues.
As for this issue 86, I am closing.