soimort/translate-shell

Inconsistent inclusion of nikud in Hebrew results

NeatNit opened this issue · 2 comments

Not sure if this is a bug in this tool or in something more upstream, but I'm seeing inconsistent inclusion of nikud - Hebrew phonetic notation (also spelt niqqud, nikkud - for future searches to find this issue) when translating words into Hebrew:

Screenshot_20240113_160443_Termux

~ $ trans -b -no-bidi en:he hello
שלום
~ $ trans -b -no-bidi en:he more
יותר
~ $ trans -b -no-bidi en:he less
פָּחוֹת
~ $ trans -b -no-bidi en:he lesser
קָטָן יוֹתֵר
~ $ trans -b -no-bidi en:he indeed
אכן
~ $ trans -b -no-bidi en:he element
אֵלֵמֶנט
~ $ trans -b -no-bidi en:he opposite
מול
~ $ trans -b -no-bidi en:he above
מֵעַל

Seemingly at random, some results include nikud and some do not. For example "less" translates with nikud, "more" without.

I noticed this bit in the docs, which I originally thought was relevant and indicative that this is a bug:

In brief mode, phonetic notation (if any) is not shown by default. To enable this, put an at sign “@” in front of the language code

But as I type this and try the listed example with and without the flag, I realise it's something completely different and not related to the target language's superfluous notation.

Either way though: for consistent output, I think it should always show a translation without nikud (nikud is extremely rare in everyday life, but always appears in dictionaries)

I noticed that the full output does show good options without nikud:

Screenshot_20240113_163318_Termux

Version info:

Translate Shell       0.9.7.1

platform              Linux
terminal type         xterm-256color
bi-di emulator        [N/A]
gawk (GNU Awk)        5.3.0
fribidi (GNU FriBidi) 1.0.13
audio player          mpv --no-config
terminal pager        less
web browser           xdg-open
user locale           en_US.UTF-8 (English)
host language         en
source language       auto
target language       en
translation engine    auto
proxy                 [NONE]
user-agent            Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 Edg/104.0.1293.54
ip version            [DEFAULT]
theme                 default
init file             [NONE]

running in Termux on Android

As far as I'm aware, the output of trans is consistent with Google Translate (https://translate.google.com/), which does include nikud (most of the time).

Screenshot from 2024-01-15 17-04-10
Screenshot from 2024-01-15 17-05-48

As trans is just a command-line interface which is mostly language-agnostic, we can't fix this on our part, unless Google's API provides both nikud-marked text and regular text (which is not the case so far).

If you want translation without nikud then I suggest using Bing as the engine:

Screenshot from 2024-01-15 17-16-57