Data encoding for human input
A combination of Reed-Solomon forward-error-correction codes, and a specific binary-to-text encoding that allows common human-input errors to be detected and possibly corrected.
This results in a highly resilient code which is very likely to work.
The prototype of this is at https://jsfiddle.net/i_e_b/x1vru8bc/ where you can play around with it.
Before passing to a FEC (in this case, Reed-Solomon), we look for patterns in the input, and try to correct for them, increasing the chance of the FEC successfully correcting the input.
Start with 32 characters, from the ASCII alpha-numeric set with indistict glyphs OLIU removed,
then split into an 'odd' and 'even' set, resulting in 16 characters in each set (for 4 bit grouping)
0 1 2 3 6 7 8 9 b G J N q X Y Z
4 5 A C D E F H K M P R s T V W
S, Q, and B are presented as lower case
to prevent confusion with 5, 0, and 8.
As no pair of even or odd characters will be next to each other, we can optimise population of these
sets to reduce the chance of accidental obscenity. The likelyhood of accidental word forming is already
quite low with this set.
Generated codes should be alternating between the two sets. We know if an input has mistakes if it is not following this alternation. This has a short-coming that we can't tell the difference between pairs of deleted characters at the start or end of the input. We try rotating the input during the Reed-Solomon step, to the limit of deleted characters.
We could try having fixed guard codes at the start and end, but this is not implemented in this project.
|
||
oeo-eoe-oeo |
-- odd-even pattern is correct, length is correct | |
| Real input: | 7MQ-6DJ-S01 |
|
_eo-eoe-oeo |
-- pattern is inverted. First char missing, put in placeholder for Reed-Solomon | |
| Deleted first char: | _MQ-6DJ-S01 |
|
oeo-_oe-oeo |
-- "ee" or "oo" around deletion point | |
| Deleted middle char: | 7MQ-_DJ-S01 |
|
oeo-eoe-oe_ |
-- all correct, but wrong length | |
| Deleted end char: | 7MQ-6DJ-S0_ |
|
oeo-eeo-oeo |
-- "eeoo" one before transposition | |
| Transposed char: | 7MQ-6JD-S01 |
|
oeo-oee-oeo |
-- "ooee" one before transposition | |
7MQ-D6J-S01 |
||
oeo-eeo-eoo |
-- "ee" at start, "oo" at end | |
| Double transposed: | 7MQ-6JD-0S1 |
|
oeo-oeo-eeo |
-- "oo" at start, "ee" at end | |
7MQ-D6S-J01 |
||
oeo-eooe-oeo |
-- too long. Error at first repeated o/e | |
| Insertion: | 7MQ-6DDJ-S01 |
The various implementations are not generally idiomatic for their language. They have been written to be portable with minimal effort -- so they basically only rely on being able to create arrays, and the rest comes packaged.