Final small tsu ッ not transliterated

Question

Final small tsu ッ not transliterated

nicolas-raoul opened this issue 9 years ago · 6 comments

Answer 1 · 2016-06-25T08:48:59.000Z

I've came up with a workaround for this that consists in merging 2 consecutive tokens into 1, i'm going send you a pull request for this too!

Btw, a small tsu at the end of the word may also indicate an exclamation mark.

Answer 2 · 2016-06-25T13:54:14.000Z

Thanks :-)

Answer 3 · 2020-04-09T21:22:19.000Z

Hello!

Thank you for building this tool!

I am running into incidences of tokens that either are :

not converted to romaji (end up in the romaji string as kanjis because the code sees them as "サ変接続" and inserts the surface token into the string buffer). example: 誕生
not inserted into the romaji string buffer at all, example: もらった (only the final 'ta' gets inserted into the romaji stringbuffer

The source for the jakaroma class has a variable that does not seem to get used in the end: lastTokenToMerge?

Do you have any suggestions, or thoughts? I'm a beginner with Java so not sure how much I can contribute, but if you point me in the right direction, I'm happy to try and push things ahead a bit. For now I've taken the stop gap approach of creating an array list of exceptions to the "サ変接続" classification which must be added to manually as these occurrences arise, and which then get correctly converted and inserted into the romaji string buffer. Probably not the best way forward but makes me feel like I'm making some sort of progress each time there is a problem with it :)

For the small tsu issue, it looks like someone had started to implement a fix, but the code doesn't actually merge the token ending with small tsu with the next one (if I'm understanding the intent correctly). Was this 'lastTokenToMerge' variable supposed to be evaluated by another if clause, that tells the next token to prepend it to itself (and I imagine, double the first consonant)? I'm going to implement that here for myself but wanted to make sure I had understood your intent?

Thanks again for making this tool!

Answer 4 · 2020-04-10T01:25:23.000Z

@malkazoid Thanks for the feedback! Unfortunately I don't remember much of the code and have other very busy projects, but I am looking forward to your pull requests :-)

Answer 5 · 2020-04-11T10:38:25.000Z

I just downloaded the tool and tested a bit, indeed the behavior is very broken.
もらった returns Ta whereas it should return Moratta, which by the way means that the っ needs to look at the next letter and double it.
誕生 returns 誕生 whereas it should return Tanjo- or similar
誕生日 returns 誕生Bi whereas it should return Tanjo-bi or similar
すごっ returns Sugo which is not bad, Sugo! would be good too I guess.
ピッザ returns ピッザ whereas it should return Pizza

Answer 6 · 2020-04-11T10:57:59.000Z

Great, we're on the same page. I'll fix as much of this as I can and put in a pull request. Thx!

…

On Sat, Apr 11, 2020 at 11:38 AM Nicolas Raoul ***@***.***> wrote: I just downloaded the tool and tested a bit, indeed the behavior is very broken. もらった returns Ta whereas it should return Moratta, which by the way means that the っ needs to look at the next letter and double it. 誕生 returns 誕生 whereas it should return Tanjo- or similar 誕生日 returns 誕生Bi whereas it should return Tanjo-bi or similar すごっ returns Sugo which is not bad, Sugo! would be good too I guess. ピッザ returns ピッザ whereas it should return Pizza — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADAI6YH6XGC6YFHKIDMGGOLRMBCCZANCNFSM4CAUB7YA> .

-- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.