ambuda-org/vidyut

Joining Visarga and Svara

Closed this issue · 10 comments

इ॒षे त्वो॒,-र्जे त्वा॑, वा॒यव॑ः स्थोपा॒यव॑ः स्थ, दे॒व व सवत प्रार्प॑यतु॒ श्रेष्ठ॑तमाय॒ कर्म॑ण॒, आ प्या॑यदध्वमघ्निया देवभा॒ग-मूर्ज॑स्वती॒ः पय॑स्वतीः प्र॒जवत-रनमी॒वा अ॑य॒क्ष्मा मा व॑ः स्ते॒न ई॑शत॒ माऽघश(ग्म्)॑सो, रु॒द्रस्य॑ हे॒तिः परि॑ वो वृणक्तु, ध्रवा अ॒समिन् गोप॑तौ स्यात ब॒ह्वीर्,
यज॑मानस्य प॒शून् पा॑हि ॥ १ (इ॒षे - त्रिच॑त्वारि(ग्म्)शत् )

The accents mess with the joining on the visargas.
If I remember right, we need to make sure all the accents precede the visarga (fixing the input if they don't) and then add a zero-width-joiner between the svara and the visarga.

Thanks! Can you give me a correct example as well? I tried this in JavaScript:

// a, svarita, ZWJ, visarga
x = "\u0905\u0951\u200d\u0903" 

And the result is अ॑‍ः which seems incorrect still.

@skmnktl following up here

So "aqH" renders as "अ॒ः" on vidyut-lipi, but aksharamukha renders it as "अः॒". I'm not at my computer, but I thought I'd answer your question for now. I can decompose that into unicode ids later.

Vidyut produces:

U+0905 : DEVANAGARI LETTER A
U+0952 : DEVANAGARI STRESS SIGN ANUDATTA {Vedic tone anudatta}
U+0903 : DEVANAGARI SIGN VISARGA

Aksharamukha does:

U+0905 : DEVANAGARI LETTER A
U+0903 : DEVANAGARI SIGN VISARGA
U+0952 : DEVANAGARI STRESS SIGN ANUDATTA {Vedic tone anudatta}

Seems the issue is with the order. Visarga combines with the accent but the accents only combine with vowels. That said, when doing indic, you'd need to reverse the order back though.

Thanks, this is helpful.

That said, when doing indic, you'd need to reverse the order back though.

What do you mean by this? I understand "For Devanagari and other Indic scripts, accents should come after the vowel and before the visasrga". What should be done for romanizations, if anything?

Actually I think it should be:

For indic scripts: vowel+visarga+accent
For roman scripts: vowel+accent+visarga

I just meant above that you'd have to invert the visarga and accent going from roman <=> indic and you'd keep the order the same when going roman => roman or indic => indic.

ah, I see! Thanks, this is clear enough for me to start preparing a fix.

I'm working on this now.

This is fixed locally. Pushing soon.

Pushed and deployed to the demo.