suttacentral/bilara-data

Bilara upgrade roadmap

sujato opened this issue · 0 comments

For the second half of 2023, we will focus on a slate of improvements, extensions, and upgrades to Bilara. This aims to make it a more flexible and fully-featured CAT webapp for SuttaCentral's needs and for further envisaged projects.

This page gives an overview, which will be fleshed out in separate issues.

Note that the issues are the definitive record, this is just an overview.

current

Bilara coordinates data in sequential "segments". Content is drawn from bilara-data, and the webapp displays at minimum "root" and "translation" text for the user.

  • "Root" is read-only.
  • "Translation" is read/write.

A "project" is set up on Bilara for the user, which defines the scope of their translation. A translator may only write to their own project. Authorization is handled by Github. The "translation memory" will suggest previous translations of similar texts by the user.

The user can display other data from bilara-data, such as additional translations, variant readings, HTML, etc., which are all read-only.

Additionally, they can invoke a "comments" field allowing them to write "comments".

The backend is similar to SuttaCentral, with JSON data, python scripting, ArangoDB for database, translation memory, and search, and Lit on the front end.

The web UI has a degree of flexibility:

  • drag and drop columns to reorder
  • adjust width of search field (which is on every page)
  • change colors (using CSS variables)

One characteristic we will keep, the site uses raw HTML for things like select, dialogs, and so on, rather than JS replacements. Keep it simple and robust! It's a utility, we don't need to win any style awards.

Note that we require no backwards compatibility, use modern standards for evergreen browsers.

Below I list the major new updates desired.

redo architecture with HTMX

Rather than a partial SPA style architecture, we propose using HTMX, with HTML served over the wire. This will simplify routing and queries.

  • JSON data is consumed by
  • Python, which populates
  • ArangoDB, which is queried with
  • AQL (Arango Query Language, a dialect of SQL), and served to
  • HTMX in the front end.

For the front end, build with HTML-first.

  • structure content with semantic HTML
  • style with CSS
  • use JS only when absolutely necessary.

corrections/suggestions

Currently there is no method for making suggestions such as a proofreader or checker would do. We need a simple web UI that will allow this. The suggestions should show up as a list available on the Home page, and the translator should be able to easily accept, reject, or edit them.

Suggestions need not be stored on Github, they can just live in the database.

Suggestions can be made by:

  • any translator
  • any proofreader (with proof rights as below)

suttacentral/bilara#14

create three grades of user: proof, normal, and super

Currently all users have the same rights, allowing them to edit their own translation but no other.

The suggestions UI will require a new kind of user, one who can make suggestions only but not edit text or translation.

In addition, we want a superuser who can edit root text and HTML, etc. Thus we need three grades:

  • normal user (write their project translation and comments, read everything else)
  • proof user (write suggestions, read everything else)
  • superuser (write anything)

suttacentral/bilara#148

make spreadsheet-like ability to split and merge rows

The main purpose of this is for expert speakers of root languages, so that they can adjust the segmentation of the text via the web app. This is especially important when adding new root texts.

Superusers will be able to split, edit, and combine the rows of the text, a bit like the "add new row" and "combine row" functions of a spreadsheet. Currently to do this we use Bilara i/o, which exports the data as tsv, then edit in a spreadsheet, then import again. This is powerful, but clumsy and error-prone, and not something a regular user would do.

The important thing is that all associated Bilara-data is updated properly.

suttacentral/bilara#149

CSS grid UI to drag and drop vertically

Currently we can drag and drop horizontally. However the nature of the app is such that the amount of information can easily spiral, eg. a user might want multiple translations readable, as well as root and comments and variants. Being able to drag and drop columns so they stack underneath each other (with the content interleaved) would be super helpful. This is achievable with CSS grid, and a basic implementation has been done.

suttacentral/bilara#63

collapse columns button

Make a simple UI to quickly collapse and expand columns.

suttacentral/bilara#69

integrate ML content and perhaps other

There is an ongoing effort to use ML for creating draft translations, Lingae Dharmae:

https://github.com/Linguae-Dharmae/chn-machine-translations

Such content will use Bilara's data structure. We want to be able to add such data to the web UI as a suggestion for the translator.

In a way this is just another field of bilara-data, but the repo is external. So we need to be able to designate an external repo for import and reading, but not writing. The ML project themselves, will re-import the translations made by the translator.

An outdated spec for this is here:

#910

Another suggestion is to use DeepL

suttacentral/bilara#124

multiple roots

In certain cases we want to be able to use multiple root texts. The immediate use is for the ongoing project of the DPCV, which is a 100,000 word manuscript currently being typed by SC volunteers, which is the oldest Pali manuscript in existence. In other cases, multiple Pali or other editions might be wanted.

The user should be able to display multiple roots if they exist. Superusers, of course, can edit them.

suttacentral/bilara#150

fix routing problems

In some cases the UI doesn't handle routing, especially in dhp and an1 and an2.

suttacentral/bilara#64

click to copy segment number

To copy a raw segment number we should have a simple click to copy in the web UI.

suttacentral/bilara#151

allow translators to edit rootless translations

In the Vinaya there are a number of translations that have no root, this is for inserted headings and the like. There is a bug that prohibits them from editing.

Note that the new superuser capability will allow a superuser to create more cases like this, eg. in new root texts.

#785

sync published and unpublished

We should ensure that published and unpublished are in sync by default, unless there is a reason for them not to be.

#786

Add folder _publication-sources to store metadata for complex projects

This will allow us to deal with cases, especially Sanskrit, where each text has a different root source.

#875

deal with localization problem where alphabetic lists differ in different languages

In certain cases, such as lists of definitions or references, the bilara data is sorted alphabetically. But of course the order changes in different languages. We propose a solution where the content is sorted on such pages by the headword rather than the segment ID.

#1620

add some keyboard shortcuts

suttacentral/bilara#14

show Pali and other lookups

suttacentral/bilara#20

build a glossary feature

Marking terms used in translation segments will help a translator maintain consistency, and can be harvested to provide terminology definitions on SuttaCentral itself.

suttacentral/bilara#95

ensure progress update works properly

suttacentral/bilara#66

ensure search and TM results update promptly and old results are eliminated.

suttacentral/bilara#118

indicate source for TM matches

suttacentral/bilara#67

Note, this will be basically the same thing as indicating when a suggestion is AI.

improve Bilara Home page loading time

Bilara Home page loads up the entire navigation tree for all projects, and as this increases the loading time grows a lot.

We should be default only load the navigation tree for the user's projects, and other projects should show only the top-level item. They can display further on click.

suttacentral/bilara#114

forbid comments on Bilara headings

suttacentral/bilara#147

web UI for publishing comments

suttacentral/bilara#141

ensure updates are correctly represented on GA and Bilara UI

suttacentral/bilara#130

test Bilara in Firefox and Safari

So far we have been slack and only tested in Chrome.

suttacentral/bilara#117

Stop ctlr + F on page opening the "how to" dialog.

suttacentral/bilara#115

test for lags in usage

suttacentral/bilara#116

introduce blacklist of forbidden characters in web input

suttacentral/bilara#113

add search to Home page

suttacentral/bilara#110

See HTML in bilara

suttacentral/bilara#99

notifications for translators

Many translators use a translation as their source. Eg. they will translate to French from English. In such cases, they should be notified if there is a change to the source translation on which they rely.

suttacentral/bilara#91

fix loss focus bug

suttacentral/bilara#80

add plus sign

suttacentral/bilara#78

fix dialogue design bug

suttacentral/bilara#79

There is a proposed fix somewhere.

dev install on mac and windows

suttacentral/bilara#125