what's the difference between 'broad' and 'narrow' tsv?

Question

what's the difference between 'broad' and 'narrow' tsv?

fake-warrior8 opened this issue 4 years ago · 3 comments

Sorry, I didn't find the description of broad and narrow tsv? Could you tell me what's their difference？

Answer 1 · 2021-06-05T13:00:32.000Z

It’s a standard term referring to how precise the transcription is: https://en.wikipedia.org/wiki/Phonetic_transcription#Narrow_versus_broad_transcription

…

On Sat, Jun 5, 2021 at 7:22 AM LDong ***@***.***> wrote: Sorry, I didn't find the description of broad and narrow tsv? Could you tell me what's their difference？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#425>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OJ23LRJNERIA2KNMQDTRICINANCNFSM46ENVIQQ> .

Answer 2 · 2021-06-05T14:02:33.000Z

It’s a standard term referring to how precise the transcription is: https://en.wikipedia.org/wiki/Phonetic_transcription#Narrow_versus_broad_transcription
…
On Sat, Jun 5, 2021 at 7:22 AM LDong @.***> wrote: Sorry, I didn't find the description of broad and narrow tsv? Could you tell me what's their difference？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#425>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OJ23LRJNERIA2KNMQDTRICINANCNFSM46ENVIQQ .

Thank you for your reply! Now I have another question. I'm studying the multilingual G2P system of SIGMORPHON 2020 shared task, which uses scraped dataset of wikipron here. I found that their multilingual dataset includes broad French dataset and narrow Armenian dataset. However, the wiki link you provided shows that 'A further disadvantage of narrow transcription is that it involves a larger number of symbols and diacritics that may be unfamiliar to non-specialists', which means narrow transcription and broad transcription may have some different phoneme tags. Thus, I wonder whether the mixture of broad and narrow transcription will confuse a multilingual G2P system? Or is there some disadvantage when mixing broad and narrow transcription dataset?

Answer 3 · 2021-06-05T14:38:41.000Z

You are right that some of the SIGMORPHON 2020 shared task data is broad and other data is narrow. (The 2021 shared data, which is definitely higher quality, is now out too.) We usually choose whichever is larger and/or higher-quality for the shared task. I see your concern. I should note that how the individual Wiktionary contributors choose to define broad vs. narrow differs from language to language. (Broad is simply whatever is in angled brackets, and narrow is whatever is in square brackets, and this is totally at the discretion of the Wiktionary contributors.) Furthermore, I would say we don't really know how multilingual sequence-to-sequence neural network models work anyways, so I'd be hesitant to make any strong guesses as to how this might affect things. In the case of Armenian vs. French, the two languages use totally disjoint scripts so you might guess that they don't really do much parameter-sharing in a multilingual model. It could even be that the mix of broad and narrow is actually beneficial, perhaps a sort of regularization effect.

…

On Sat, Jun 5, 2021 at 10:02 AM LDong ***@***.***> wrote: It’s a standard term referring to how precise the transcription is: https://en.wikipedia.org/wiki/Phonetic_transcription#Narrow_versus_broad_transcription … <#m_7310177812352944298_> On Sat, Jun 5, 2021 at 7:22 AM LDong *@*.***> wrote: Sorry, I didn't find the description of broad and narrow tsv? Could you tell me what's their difference？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#425 <#425>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OJ23LRJNERIA2KNMQDTRICINANCNFSM46ENVIQQ . Thank you for your reply! Now I have another question. I'm studying the multilingual G2P system of SIGMORPHON 2020 shared task, which uses scraped dataset of wikipron here. I found that their multilingual dataset includes broad French dataset and narrow Armenian dataset. However, the wiki link you provided shows that 'A further disadvantage of narrow transcription is that it involves a larger number of symbols and diacritics that may be unfamiliar to non-specialists', which means narrow transcription and broad transcription may have some different phoneme tags. Thus, I wonder whether the mixture of broad and narrow transcription will confuse a multilingual G2P system? Or is there some disadvantage when mixing broad and narrow transcription dataset? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#425 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OKMPHJQQ53UZTNQPF3TRIVALANCNFSM46ENVIQQ> .