glossary feature

Question

glossary feature

Opened this issue 4 years ago · 4 comments

standardize terminology

Translators want to standardize terminology, and the Bilara TM helps them to do that. But it is hard to keep track. CATs such as Pootle often include a "terminology" feature that suggests commonly used words. Let's see if we can do something like that on Bilara.

Take Pali as the example.

sources

There is a Google sheet with a standardized list of about 1000 terms I used in my translations. We'd need to add Vinaya to that.

UI

Let's add the glossary function to a new panel that appears when clicking in a segment in Bilara. It will be much like the current TM suggestions, maybe at the top of them? But style it distinctly.

What's in that panel?

For each segment, analyze it with the Pali parser, then match the results against the glossary list. Populate the glossary panel with matches. Let's take the segment Tatra kho bhagavā bhikkhū āmantesi as example.

__________________________________________________________________
|                                                                                                                                                       
|    bhagavā   ✅ ❎      bhikkhu   ✅ ❎      āmanteti   ✅ ❎                         
|    buddha              mendicant            addresses
|_________________________________________________________________

Rejected examples are dismissed and do not show on that segment, or any matching segment, again.
Confirmed examples are entered into a JSON file in bilara-data.

This deals with the cases where the parser is creating false positives. But what if there are missing terms? They will have to be added manually.

Add a + icon to add terms
That opens a text field.
The text field selects from existing terms; start typing and it will suggest a term.
Confirm the term to be added.

Now we have something like:

__________________________________________________________________
|                                                                                                                                                       
|  bhagavā   ✅ ❎    bhikkhu   ✅ ❎    āmanteti   ✅ ❎      ⨁
|  buddha            mendicant          addresses
|_________________________________________________________________

We might want to consider multiple senses of a word, but I will leave that aside for now.

glossary creation

So far we have been considering the case of applying the existing English glossary. But translators will want to create their own glossaries. For this we'll want a third layer in the translator's own language. Let's use French!

Existing glosses are suggested, the translator fills in those that are missing.

__________________________________________________________________
|                                                                                                                                                       
|  bhagavā   ✅ ❎    bhikkhu   ✅ ❎    āmanteti   ✅ ❎      ⨁
|  buddha            mendicant          addresses
|  bouddha           mendiant           _________
|_________________________________________________________________

Okay, but what about if we have multiple translators in the same language? We might try to standardize on a vocabulary, but inevitably some translators will prefer their own renderings. These can be added in a per-translator basis.

Let's assume we're working in English.

__________________________________________________________________
|                                                                                                                                                       
|  bhagavā   ✅ ❎    bhikkhu   ✅ ❎    āmanteti   ✅ ❎      ⨁
|  buddha            mendicant          addresses
|  Blessed One       monk           
|_________________________________________________________________

The standard glossary would show up, and personal ones added where needed.

significance

It often happens that terms are used very often, and they are not really useful. Bhikkhu is a good example. Do we really want a popup every time for that? No we do not!

Perhaps we could show only the first example of a term within a given sutta?

JSON

Confirmed terms are taken from the Google sheet or directly from user input and added to a Bilara JSON file. Use the same format as for variant reading. Folder is glossary/en/sujato.

  "mn108:3.1": "bhagavā | bhikkhu | āmanteti

This defines what terms apply to what segment. The glosses themselves are looked up in the relevant glossary file for that language.

who does it?

Any translator.

As different people work on the files, they accept, add, or delete glossary entries, winnowing down the matches, and reducing the role of the Pali parser, which will ultimately become redundant, as all the terms will be defined in bilara-data.

Non-English translators cannot change the English glosses (tho they can of course suggest changes). But they can change whether a gloss is relevant for that segment. Those changes are universal, i.e. they apply to every language. In other words, there is only one list of terms in the root language, but many glossaries in the different languages.

This requires a degree of trust, but it should usually be a simple decision. And in any case, whether a glossary has a listing for that particular segment is not a critical issue, so long as the list is generally useful.

application

The glossary list can be applied on main SC and elsewhere if so desired. The glossary entries would then be keyed to the entries in NCPED, so students can get short definitions of important terms, even if they are not doing the full Pali lookup.

Answer 1 · 2021-06-10T09:52:59.000Z

Thank you for this description.

From the perspective of a translator, this would be a very useful thing, and certainly from many other perspectives too.

Currently, to keep terminology consistent, I use various search tools, such as Bilara search and/or scv-bilara trilingual search; for word definitions I go to SuttaCentral and use the Pali lookup. Bilara TM suggestions are of course very useful, but if the segment doesn't match another instance 100%, TM usually focuses on one passage from the segment, so I don't see suggestions that include all terms of that segment.

The way it is described here sounds like building the glossary can go along with the translation work, so it isn't an extra task in a different working environment. Using the tool and building/improving it go hand in hand, which sounds like a very good approach.

Would a translator who translates to a non-English language see the glossary in their own language, or would they be able to see both English and their own language? The latter would actually be desirable.

These are just some first thoughts. More will certainly come when I really see it in practice.

Answer 2 · 2021-06-10T09:57:22.000Z

Would a translator who translates to a non-English language see the glossary in their own language, or would they be able to see both English and their own language?

The idea would be that they see it in root, english, and their language, as per the French example above.

A translator could assemble their own glossary file in one big lot by just adding to the existing spreadsheet or whatever. Or else go bit by bit as they translate.

I can envisage that if there are several translators, each with a different glossary, the number of versions could become overwhelming. So I'm trying to think of how we can encourage a standardized way, while still allowing variations.

It's a real balance to show the right amount of information.

If you check the 84000 texts, they have a terminology feature, which works quite well.

https://read.84000.co/translation/toh287.html

I'm not sure how exactly it's created. But you can see, for example, that "mendicant Gotama" is highlighted each time it occurs, sometimes many times per paragraph. I think we can be a bit cleverer than that.

I like the way the terms are invisible until you click on the text, it keeps it clean.

84000 uses a CAT, I think it's Omega. I guess they create and manage their terminology in there.

Anyway you can check out their data if you like!

https://github.com/84000

Answer 3 · 2021-06-10T10:01:54.000Z

I can envisage that if there are several translators, each with a different glossary, the number of versions could become overwhelming. So I'm trying to think of how we can encourage a standardized way, while still allowing variations.

Yes. Probably the best is to start with something and see how it goes, having in mind that this may not be the final version. Then address problems as they arise.

Answer 4 · 2021-06-10T10:07:14.000Z

I like the way the terms are invisible until you click on the text, it keeps it clean.

It's amazing, yes.