/ENSNormalize.java

ENSIP-15 in Java

Primary LanguageJavaMIT LicenseMIT

ENSNormalize.java

0-dependency ENSIP-15 in Java

import io.github.adraffy.ens.ENSNormalize;
ENSNormalize.ENSIP15 // Main Library (global instance)

Primary API ENSIP15

// String -> String
// throws on invalid names
ENSNormalize.ENSIP15.normalize("RaFFY🚴‍♂️.eTh"); // "raffy🚴‍♂.eth"

// works like normalize()
ENSNormalize.ENSIP15.beautify("1⃣2⃣.eth"); // "1️⃣2️⃣.eth"

Additional NormDetails (Experimental)

// works like normalize(), throws on invalid names
// string -> NormDetails
NormDetails details = ENSNormalize.ENSIP15.normalizeDetails("💩ì.a");

String name; // normalized name
boolean possiblyConfusing; // if name should be carefully reviewed
HashSet<Group> groups; // unique groups in name
HashSet<EmojiSequence> emojis; // unique emoji in name
String groupDescription() = "Emoji+Latin"; // group summary for name
boolean hasZWJEmoji(); // if any emoji contain 200D

Output-based Tokenization Label

// String -> List<Label>
// never throws
List<Label> labels = ENSNormalize.ENSIP15.split("💩Raffy.eth_");
// [
//   Label {
//     input: [ 128169, 82, 97, 102, 102, 121 ],  
//     tokens: [
//       OutputToken { cps: [ 128169 ], emoji: EmojiSequence { ... } }
//       OutputToken { cps: [ 114, 97, 102, 102, 121 ] }
//     ],
//     normalized: [ 128169, 114, 97, 102, 102, 121 ],
//     group: Group { name: "Latin", ... }
//   },
//   Label {
//     input: [ 101, 116, 104, 95 ],
//     tokens: [ 
//       OutputToken { cps: [ 101, 116, 104, 95 ] }
//     ],
//     error: NormException { kind: "underscore allowed only at start" }
//   }
// ]

Normalization Properties

  • GroupENSIP15.groups: List<Group>
  • EmojiSequenceENSIP15.emojis: List<EmojiSequence>
  • WholeENSIP15.wholes: List<Whole>

Error Handling

All errors are safe to print. NormException { kind: string, reason: string? } is the base exception. Functions that accept names as input wrap their exceptions in InvalidLabelException { start, end, error: NormException } for additional context.

  • "disallowed character"DisallowedCharacterException { cp }
  • "illegal mixture"IllegalMixtureException { cp, group, other? }
  • "whole-script confusable"ConfusableException { group, other }
  • "empty label"
  • "duplicate non-spacing marks"
  • "excessive non-spacing marks"
  • "leading fenced"
  • "adjacent fenced"
  • "trailing fenced"
  • "leading combining mark"
  • "emoji + combining mark"
  • "invalid label extension"
  • "underscore allowed only at start"

Utilities

Normalize name fragments for substring search:

// String -> String
// only throws InvalidLabelException w/DisallowedCharacterException
ENSNormalize.ENSIP15.normalizeFragment("AB--");
ENSNormalize.ENSIP15.normalizeFragment("..\u0300");
ENSNormalize.ENSIP15.normalizeFragment("\u03BF\u043E");
// note: normalize() throws on these

Construct safe strings:

// int -> String
ENSNormalize.ENSIP15.safeCodepoint(0x303); // "◌̃ {303}"
ENSNormalize.ENSIP15.safeCodepoint(0xFE0F); // "{FE0F}"
// int[] -> String
ENSNormalize.ENSIP15.safeImplode(0x303, 0xFE0F); // "◌̃{FE0F}"

Determine if a character shouldn't be printed directly:

// ReadOnlyIntSet 
ENSNormalize.ENSIP15.shouldEscape.contains(0x202E); // RIGHT-TO-LEFT OVERRIDE => true

Determine if a character is a combining mark:

// ReadOnlyIntSet
ENSNormalize.ENSIP15.combiningMarks.contains(0x20E3); // COMBINING ENCLOSING KEYCAP => true

Unicode Normalization Forms NF

import io.github.adraffy.ens.ENSNormalize;

// String -> String
ENSNormalize.NF.NFC("\u0065\u0300"); // "\u00E8"
ENSNormalize.NF.NFD("\u00E8");       // "\u0065\u0300"

// int[] -> int[]
ENSNormalize.NF.NFC(0x65, 0x300); // [0xE8]
ENSNormalize.NF.NFD(0xE8);        // [0x65, 0x300]

Publish Instructions