ENSNormalize.cs

0-dependency ENSIP-15 in C#

Reference Implementation: adraffy/ens-normalize.js
- Unicode: 15.1.0
- Spec Hash: 1f6d3bdb7a724fe3b91f6d73ab14defcb719e0f4ab79022089c940e7e9c56b9c
Passes 100% ENSIP-15 Validation Tests
Passes 100% Unicode Normalization Tests
Space Efficient: ~58KB .dll using Inline Blobs via make.js
Legacy Support: netstandard1.1, net35, netcoreapp3.1
Nuget Repository:

using ADRaffy.ENSNormalize;
ENSNormalize.ENSIP15 // Main Library (global instance)

Primary API ENSIP15

// string -> string
// throws on invalid names
ENSNormalize.ENSIP15.Normalize("RaFFY🚴‍♂️.eTh"); // "raffy🚴‍♂.eth"

// works like Normalize()
ENSNormalize.ENSIP15.Beautify("1⃣2⃣.eth"); // "1️⃣2️⃣.eth"

Additional NormDetails (Experimental)

// works like Normalize(), throws on invalid names
// string -> NormDetails
NormDetails details = ENSNormalize.ENSIP15.NormalizeDetails("💩ì.a");

string Name; // normalized name
bool PossiblyConfusing; // if name should be carefully reviewed
HashSet<Group> Groups; // unique groups in name
HashSet<EmojiSequence> Emojis; // unique emoji in name
string GroupDescription = "Emoji+Latin"; // group summary for name
bool HasZWJEmoji; // if any emoji contain 200D

Output-based Tokenization Label

// string -> Label[]
// never throws
Label[] labels = ENSNormalize.ENSIP15.Split("💩Raffy.eth_");
// [
//   Label {
//     Input: [ 128169, 82, 97, 102, 102, 121 ],  
//     Tokens: [
//       OutputToken { Codepoints: [ 128169 ], IsEmoji: true }
//       OutputToken { Codepoints: [ 114, 97, 102, 102, 121 ] }
//     ],
//     Normalized: [ 128169, 114, 97, 102, 102, 121 ],
//     Group: Group { Name: "Latin", ... }
//   },
//   Label {
//     Input: [ 101, 116, 104, 95 ],
//     Tokens: [ 
//       OutputToken { Codepoints: [ 101, 116, 104, 95 ] }
//     ],
//     Error: NormException { Kind: "underscore allowed only at start" }
//   }
// ]

Normalization Properties

Group — ENSIP15.Groups: IList<Group>
EmojiSequence — ENSIP15.Emojis: IList<EmojiSequence>
Whole — ENSIP15.Wholes: IList<Whole>

Error Handling

All errors are safe to print. NormException { Kind: string, Reason: string? } is the base exception. Functions that accept names as input wrap their exceptions in InvalidLabelException { Label: string, Error: NormException } for additional context.

"disallowed character" — DisallowedCharacterException { Codepoint }
"illegal mixture" — IllegalMixtureException { Codepoint, Group, OtherGroup? }
"whole-script confusable" — ConfusableException { Group, OtherGroup }
"empty label"
"duplicate non-spacing marks"
"excessive non-spacing marks"
"leading fenced"
"adjacent fenced"
"trailing fenced"
"leading combining mark"
"emoji + combining mark"
"invalid label extension"
"underscore allowed only at start"

Utilities

Normalize name fragments for substring search:

// string -> string
// only throws InvalidLabelException w/DisallowedCharacterException
ENSNormalize.ENSIP15.NormalizeFragment("AB--");
ENSNormalize.ENSIP15.NormalizeFragment("..\u0300");
ENSNormalize.ENSIP15.NormalizeFragment("\u03BF\u043E");
// note: Normalize() throws on these

Construct safe strings:

// int -> string
ENSNormalize.ENSIP15.SafeCodepoint(0x303); // "◌̃"
ENSNormalize.ENSIP15.SafeCodepoint(0xFE0F); // "{FE0F}"
// IList<int> -> string
ENSNormalize.ENSIP15.SafeImplode(new int[]{ 0x303, 0xFE0F }); // "◌̃{FE0F}"

Determine if a character shouldn't be printed directly:

// ReadOnlyIntSet (like IReadOnlySet<int>)
ENSNormalize.ENSIP15.ShouldEscape.Contains(0x202E); // RIGHT-TO-LEFT OVERRIDE => true

Determine if a character is a combining mark:

// ReadOnlyIntSet
ENSNormalize.ENSIP15.CombiningMarks.Contains(0x20E3); // COMBINING ENCLOSING KEYCAP => true

Unicode Normalization Forms NF

using ADRaffy.ENSNormalize;

// string -> string
ENSNormalize.NF.NFC("\x65\u0300"); // "\xE8"
ENSNormalize.NF.NFD("\xE8");       // "\x65\u0300"

// IEnumerable<int> -> List<int>
ENSNormalize.NF.NFC(new int[]{ 0x65, 0x300 }); // [0xE8]
ENSNormalize.NF.NFD(new int[]{ 0xE8 });        // [0x65, 0x300]

adraffy/ENSNormalize.cs