Bilara upgrade roadmap
sujato opened this issue · 0 comments
For the second half of 2023, we will focus on a slate of improvements, extensions, and upgrades to Bilara. This aims to make it a more flexible and fully-featured CAT webapp for SuttaCentral's needs and for further envisaged projects.
This page gives an overview, which will be fleshed out in separate issues.
Note that the issues are the definitive record, this is just an overview.
current
Bilara coordinates data in sequential "segments". Content is drawn from bilara-data, and the webapp displays at minimum "root" and "translation" text for the user.
- "Root" is read-only.
- "Translation" is read/write.
A "project" is set up on Bilara for the user, which defines the scope of their translation. A translator may only write to their own project. Authorization is handled by Github. The "translation memory" will suggest previous translations of similar texts by the user.
The user can display other data from bilara-data, such as additional translations, variant readings, HTML, etc., which are all read-only.
Additionally, they can invoke a "comments" field allowing them to write "comments".
The backend is similar to SuttaCentral, with JSON data, python scripting, ArangoDB for database, translation memory, and search, and Lit on the front end.
The web UI has a degree of flexibility:
- drag and drop columns to reorder
- adjust width of search field (which is on every page)
- change colors (using CSS variables)
One characteristic we will keep, the site uses raw HTML for things like select, dialogs, and so on, rather than JS replacements. Keep it simple and robust! It's a utility, we don't need to win any style awards.
Note that we require no backwards compatibility, use modern standards for evergreen browsers.
Below I list the major new updates desired.
redo architecture with HTMX
Rather than a partial SPA style architecture, we propose using HTMX, with HTML served over the wire. This will simplify routing and queries.
- JSON data is consumed by
- Python, which populates
- ArangoDB, which is queried with
- AQL (Arango Query Language, a dialect of SQL), and served to
- HTMX in the front end.
For the front end, build with HTML-first.
- structure content with semantic HTML
- style with CSS
- use JS only when absolutely necessary.
corrections/suggestions
Currently there is no method for making suggestions such as a proofreader or checker would do. We need a simple web UI that will allow this. The suggestions should show up as a list available on the Home page, and the translator should be able to easily accept, reject, or edit them.
Suggestions need not be stored on Github, they can just live in the database.
Suggestions can be made by:
- any translator
- any proofreader (with proof rights as below)
create three grades of user: proof, normal, and super
Currently all users have the same rights, allowing them to edit their own translation but no other.
The suggestions UI will require a new kind of user, one who can make suggestions only but not edit text or translation.
In addition, we want a superuser who can edit root text and HTML, etc. Thus we need three grades:
- normal user (write their project translation and comments, read everything else)
- proof user (write suggestions, read everything else)
- superuser (write anything)
make spreadsheet-like ability to split and merge rows
The main purpose of this is for expert speakers of root languages, so that they can adjust the segmentation of the text via the web app. This is especially important when adding new root texts.
Superusers will be able to split, edit, and combine the rows of the text, a bit like the "add new row" and "combine row" functions of a spreadsheet. Currently to do this we use Bilara i/o, which exports the data as tsv
, then edit in a spreadsheet, then import again. This is powerful, but clumsy and error-prone, and not something a regular user would do.
The important thing is that all associated Bilara-data is updated properly.
CSS grid UI to drag and drop vertically
Currently we can drag and drop horizontally. However the nature of the app is such that the amount of information can easily spiral, eg. a user might want multiple translations readable, as well as root and comments and variants. Being able to drag and drop columns so they stack underneath each other (with the content interleaved) would be super helpful. This is achievable with CSS grid, and a basic implementation has been done.
collapse columns button
Make a simple UI to quickly collapse and expand columns.
integrate ML content and perhaps other
There is an ongoing effort to use ML for creating draft translations, Lingae Dharmae:
https://github.com/Linguae-Dharmae/chn-machine-translations
Such content will use Bilara's data structure. We want to be able to add such data to the web UI as a suggestion for the translator.
In a way this is just another field of bilara-data, but the repo is external. So we need to be able to designate an external repo for import and reading, but not writing. The ML project themselves, will re-import the translations made by the translator.
An outdated spec for this is here:
Another suggestion is to use DeepL
multiple roots
In certain cases we want to be able to use multiple root texts. The immediate use is for the ongoing project of the DPCV, which is a 100,000 word manuscript currently being typed by SC volunteers, which is the oldest Pali manuscript in existence. In other cases, multiple Pali or other editions might be wanted.
The user should be able to display multiple roots if they exist. Superusers, of course, can edit them.
fix routing problems
In some cases the UI doesn't handle routing, especially in dhp and an1 and an2.
click to copy segment number
To copy a raw segment number we should have a simple click to copy in the web UI.
allow translators to edit rootless translations
In the Vinaya there are a number of translations that have no root, this is for inserted headings and the like. There is a bug that prohibits them from editing.
Note that the new superuser capability will allow a superuser to create more cases like this, eg. in new root texts.
sync published and unpublished
We should ensure that published and unpublished are in sync by default, unless there is a reason for them not to be.
Add folder _publication-sources
to store metadata for complex projects
This will allow us to deal with cases, especially Sanskrit, where each text has a different root source.
deal with localization problem where alphabetic lists differ in different languages
In certain cases, such as lists of definitions or references, the bilara data is sorted alphabetically. But of course the order changes in different languages. We propose a solution where the content is sorted on such pages by the headword rather than the segment ID.
add some keyboard shortcuts
show Pali and other lookups
build a glossary feature
Marking terms used in translation segments will help a translator maintain consistency, and can be harvested to provide terminology definitions on SuttaCentral itself.
ensure progress update works properly
ensure search and TM results update promptly and old results are eliminated.
indicate source for TM matches
Note, this will be basically the same thing as indicating when a suggestion is AI.
improve Bilara Home page loading time
Bilara Home page loads up the entire navigation tree for all projects, and as this increases the loading time grows a lot.
We should be default only load the navigation tree for the user's projects, and other projects should show only the top-level item. They can display further on click.
forbid comments on Bilara headings
web UI for publishing comments
ensure updates are correctly represented on GA and Bilara UI
test Bilara in Firefox and Safari
So far we have been slack and only tested in Chrome.
Stop ctlr
+ F
on page opening the "how to" dialog.
test for lags in usage
introduce blacklist of forbidden characters in web input
add search to Home page
See HTML in bilara
notifications for translators
Many translators use a translation as their source. Eg. they will translate to French from English. In such cases, they should be notified if there is a change to the source translation on which they rely.
fix loss focus bug
add plus sign
fix dialogue design bug
There is a proposed fix somewhere.