/bidi-js

A pure JavaScript implementation of the Unicode Bidirectional Algorithm

Primary LanguageJavaScriptMIT LicenseMIT

bidi-js

This is a pure JavaScript implementation of the Unicode Bidirectional Algorithm version 13.0.0. Its goals, in no particular order, are to be:

  • Correct
  • Small
  • Fast

Conformance

This implementation currently conforms to section UAX-C1 of the bidi spec, as verified by running all the provided conformance tests.

Compatibility

It has no external dependencies and therefore should run just fine in any relatively capable web browser, Node.js, etc. The provided distribution .js files are valid ES5.

Usage

Install it from npm:

npm install bidi-js

NPM

Import and initialize:

import bidiFactory from 'bidi-js'
// or: const bidiFactory = require('bidi-js')

const bidi = bidiFactory()

The bidi-js package's only export is a factory function which you must invoke to return a bidi object; that object exposes the methods for bidi processing.

(Why a factory function? The main reason is to ensure the entire module's code is wrapped within a single self-contained function with no closure dependencies. This enables that function to be stringified and passed into a web worker, for example.)

Now that you have the bidi object, you can:

Calculate bidi embedding levels

const embeddingLevels = bidi.getEmbeddingLevels(
  text, //the input string containing mixed-direction text
  explicitDirection //"ltr" or "rtl" if you don't want to auto-detect it
)

const { levels, paragraphs } = embeddingLevels

The result object embeddingLevels will usually be passed to other functions described below. Its contents, should you need to inspect them individually, are:

  • levels is a Uint8Array holding the calculated bidi embedding levels for each character in the string. The most important thing to know about these levels is that any given character is in a right-to-left scope if its embedding level is an odd number, and left-to-right if it's an even number.

  • paragraphs is an array of {start, end, level} objects, one for each paragraph in the text (paragraphs are separated by explicit breaking characters, not soft line wrapping). The start and end indices are inclusive, and level is the resolved base embedding level of that paragraph.

Calculate character reorderings

const flips = bidi.getReorderSegments(
  text, //the full input string
  embeddingLevels //the full result object from getEmbeddingLevels
)

// Process all reversal sequences, in order:
flips.forEach(range => {
  const [start, end] = range
  // Reverse this sequence of characters from start to end, inclusive
  for (let i = start; i <= end; i++) {
    //...
  }
})

Each "flip" is a range that should be reversed in place; they must all be applied in order.

Sometimes you don't want to process the whole string at once, but just a particular substring. A common example would be if you've applied line wrapping, in which case you need to process each line individually (in particular this does some special handling for trailing whitespace for each line). For this you can pass the extra start and end parameters:

yourWrappedLines.forEach(([lineStart, lineEnd]) => {
  const flips = bidi.getReorderSegments(
    text,
    embeddingLevels,
    lineStart,
    lineEnd //inclusive
  )
  // ...process flips for this line
})

Handle right-to-left mirrored characters

Some characters that resolve to right-to-left need to be swapped with their "mirrored" characters. Examples of this are opening/closing parentheses. You can determine all the characters that need to be mirrored like so:

const mirrored = bidi.getMirroredCharactersMap(
  text,
  embeddingLevels
)

This returns a Map of numeric character indices to replacement characters.

You can also process just a substring with extra start and end parameters:

const mirrored = bidi.getMirroredCharactersMap(
  text,
  embeddingLevels,
  start,
  end //inclusive
)

If you'd rather process mirrored characters individually, you can use the single getMirroredCharacter function, just make sure you only do it for right-to-left characters (those whose embedding level is an odd number.) It will return null if the character doesn't support mirroring.

const mirroredChar = (embeddingLevels.levels[charIndex] & 1) //odd number means RTL
    ? bidi.getMirroredCharacter(text[charIndex])
    : null

Get a character's bidi type

This is used internally, but you can also ask for the "bidi character type" of any character, should you need it:

const bidiType = bidi.getBidiCharTypeName(string[charIndex])
// e.g. "L", "R", "AL", "NSM", ...