/MultiCode

Highly resilient encoding for human input

Primary LanguageC#BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

MultiCode

Data encoding for human input

A combination of Reed-Solomon forward-error-correction codes, and a specific binary-to-text encoding that allows common human-input errors to be detected and possibly corrected.

This results in a highly resilient code which is very likely to work.

The prototype of this is at https://jsfiddle.net/i_e_b/x1vru8bc/ where you can play around with it.

Design

Before passing to a FEC (in this case, Reed-Solomon), we look for patterns in the input, and try to correct for them, increasing the chance of the FEC successfully correcting the input.

Start with 32 characters, from the ASCII alpha-numeric set with indistict glyphs OLIU removed, then split into an 'odd' and 'even' set, resulting in 16 characters in each set (for 4 bit grouping)

 0 1 2 3 6 7 8 9 b G J N q X Y Z
4 5 A C D E F H K M P R s T V W

S, Q, and B are presented as lower case to prevent confusion with 5, 0, and 8. As no pair of even or odd characters will be next to each other, we can optimise population of these sets to reduce the chance of accidental obscenity. The likelyhood of accidental word forming is already quite low with this set.

Generated codes should be alternating between the two sets. We know if an input has mistakes if it is not following this alternation. This has a short-coming that we can't tell the difference between pairs of deleted characters at the start or end of the input. We try rotating the input during the Reed-Solomon step, to the limit of deleted characters.

We could try having fixed guard codes at the start and end, but this is not implemented in this project.

Error Examples

oeo-eoe-oeo -- odd-even pattern is correct, length is correct
Real input: 7MQ-6DJ-S01
 
_eo-eoe-oeo -- pattern is inverted. First char missing, put in placeholder for Reed-Solomon
Deleted first char: _MQ-6DJ-S01
 
oeo-_oe-oeo -- "ee" or "oo" around deletion point
Deleted middle char: 7MQ-_DJ-S01
 
oeo-eoe-oe_ -- all correct, but wrong length
Deleted end char: 7MQ-6DJ-S0_
 
oeo-eeo-oeo -- "eeoo" one before transposition
Transposed char: 7MQ-6JD-S01
oeo-oee-oeo -- "ooee" one before transposition
7MQ-D6J-S01
 
oeo-eeo-eoo -- "ee" at start, "oo" at end
Double transposed: 7MQ-6JD-0S1
oeo-oeo-eeo -- "oo" at start, "ee" at end
7MQ-D6S-J01
 
oeo-eooe-oeo -- too long. Error at first repeated o/e
Insertion: 7MQ-6DDJ-S01

The weird implementation

The various implementations are not generally idiomatic for their language. They have been written to be portable with minimal effort -- so they basically only rely on being able to create arrays, and the rest comes packaged.