bragefuglseth/keypunch

[Language Request]: Bangla/Bengali

Closed this issue · 9 comments

English Name

Bangla/Bengali

Native Name

বাংলা

Orthography

Bengali alphabet is derived from the Brahmi alphabet while also closely relating to the Devanagari alphabet. It is the 7th most spoken language in the world and is the official language of Bangladesh and 2nd most spoken in India.

Basics

Bengali consists of 50 letters. 11 vowels ( অ, আ, ই, ঈ, উ, ঊ, ঋ, এ, ঐ, ও, ঔ ) and 39 Consonants (ক, খ, গ, ঘ, ঙ,
চ, ছ, জ, ঝ, ঞ, ট, ঠ, ড, ঢ, ণ, ত, থ, দ, ধ, ন, প, ফ, ব, ভ, ম, য, র, ল, শ, ষ, স, হ, ড়, ঢ়, য়, ৎ, ং, ঃ, ঁ).
Vowels can be found at the beginning, in the middle or in the end of the world. Example: (লি, আশ, স). Same with consonants. Example: কলম -> ক, ল, ম each a consonant on different position.

Diacritics

When we join a vowel with a consonant, we use the short form of that vowel (Vowel Diacritics). This are called KAR(কার). Bengali has 10 vowel diacritics (া, ি, ী, ু, ূ, ে, ৈ, ো, ৌ, ৃ). They can be added after (সাপ), before (বিষ), below (কুটিল) or before and after consonants ( পৌর ).
There are also 7 consonant diacritics, they are called PHOLA (ফলা) that can join with vowel or consonant. we use hôsôntô (্) for this operation. Example below
য ফলা -> অ + ্ + য -> অ্য -> অ্যাপ্লিকেশন
ব ফলা -> শ + ্ + ব -> শ্ব -> বিশ্বাস
ম-ফলা ->ন + ্ + ম -> ন্ম -> তন্ময়
ণ-ফলা ->হ + ্ + ণ -> হ্ণ -> অপরাহ্ণ
ন-ফলা ->ত + ্ + ন -> ত্ন -> রত্ন
রেফ -> র + ্ + শ-> র্শ -> বর্শ
র-ফলা -> ক + ্ + র -> ক্র -> ক্রম
ল-ফলা -> ল + ্ + ল-> ল্ল -> বল্লম

Consonant Conjuncts

A conjunct is a combination of two consonants. There are a lot of them. Consonant diacritics are also a form of conjuncts but not vowels diacritics are not. We write them the same way we write consonant diacritics. Example:
ক্ক - ক + ্ + ক
ক্ট - ক + ্ + ট
ক্ষ - ক + ্ + ম

Punctuation Marks

Same as English. Once exception is we use DARI ( । ) instead of full stop (.) and space is needed before and after the sentence is finished. Example:
রফিক মাছ ধরতে গিয়েছে ।

Writing

Bengali has no letter case so not capital or small letters. In linux I use the inbuilt Bangla (Probhat) layout for writing. Whatever layout it may be the writing system is almost the same. Here are some basic rules

  • While writing Vowel Diacritics always come after the consonant. Example
    ি + ব + ষ - িবষ ❌
    ব + ি + ষ - বিষ ✅
  • র ফলা (one of the consonant diacritics) can go before or after a consonant but based on it's position the word will change. When it goes before the word it is called Ref (রেফ), when it goes after it is called R-Phola(র-ফলা). Example.
    রেফ -> র + ্ + শ-> র্শ -> বর্শ
    র-ফলা -> ক + ্ + র -> ক্র -> ক্রম

probhat

Writing a some Bangla using Probhat (QWERTY)

বাংলা আমার মাতৃভাষা । বৃহন্নলার পাঁচ ভাই ক্ষমতার লোভে মত্ত ।
baLla vmar maf<BaSa . b<hn/nlar pa>c BaI k/Smfar l]B[ mf/f .

Implementation Assistance

  • I am proficient enough in this language to spot mistakes and unnatural words
  • I can assist with testing and reviewing the language implementation

Additional Information

No response

Hi, thanks a lot for the language request! That writing sample (বাংলা আমার মাতৃভাষা ।) was really helpful. I'm currently spinning up an initial implementation of Bangla text generation, but I'm a little confused about the space before the dari sign. When implementing text generation for Hindi (#6) and Nepali (#5), I never encountered this convention, and upon doing some further research, I've discovered that Microsoft's Bangla (India) Localization Style Guide doesn't recommend it either:

A punctuation mark (৷) indicating a full stop, placed at the end of declarative sentences
and other statements thought to be complete. There is no space between the last letter
and the period.
Use one space between the period and the first letter of the next
sentence.

If you think it makes sense for the extra space to be there for Bangla specifically, and not Hindi and Nepali, I'll gladly go ahead and set that up. However, for the sake of consistency across all Devanagari languages in Keypunch, I'm currently inclined to use the convention of no punctuation between the dari and its preceding word for all three of them 🙂

Thank you very much. You can ignore the extra spacing before (।).
I have built the app from repo using gnome builder and testing it for a few minutes. A few problems I found.

  1. য় (z) and ড় (R) does not work in simple and advance mode.
  2. কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅
  3. My mistake for not mentioning it before. Bangla has it's numbers
    (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) -> (০, ১, ২, ৩, ৪, ৫, ৬, ৭, ৮, ৯)। I apologize for not mentioning it in the original issue.

য় (z) and ড় (R) does not work in simple and advance mode.

I've discovered that Monkeytype has the exact same issue, and it's related to character representation. In the word list, those letters are stored as two characters; a base shape and a modifier character for the dot. In modern Bangla text encoding, though, those letters can also be represented as a single character that has the dot included, and that's what people usually enter on keyboards. These two representation methods are completely different letters from the perspective of the computer.

Monkeytype apparently has to represent them the former way due to technical constraints, but since Keypunch uses GTK's native text machinery instead of rolling its own, I don't think we have the same issue. So a quick fix I'll try for now is to just replace the "outdated" letters with their modern counterparts.

কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅

Both of those spellings exist in the word list. I assume that the first one should be removed? It would be good to open an issue against Monkeytype as well, then. That's where the original list is from.

I haven't looked at the numbers yet, but the other mistakes should be fixed.

This 4 words have problem, নিয়ে হয়ে দিয়ে হয়েছে. The right spelling are given below

নিয়ে ( niz[ );‌
হয়ে ( hz[ );
দিয়ে ( qiz[ );
হয়েছে ( hz[C[ )

Could you give it a go again now? 🙂

By the way, if you'd like to , you can provide a name (and optionally a website link or an email address), and I'll credit you in the Orthography section of the about window.

Everything works perfectly now! You can close this now.

I'll credit you in the Orthography section of the about window.

I would be honored if you may include my name Arnob Goswami. Thank you for your consideration.

I'm very glad to hear that! Thank you so much for your help.