0-dependency ENSIP-15 in C#
- Reference Implementation: adraffy/ens-normalize.js
- Unicode:
15.1.0
- Spec Hash:
1f6d3bdb7a724fe3b91f6d73ab14defcb719e0f4ab79022089c940e7e9c56b9c
- Unicode:
- Passes 100% ENSIP-15 Validation Tests
- Passes 100% Unicode Normalization Tests
- Space Efficient:
~58KB .dll
using Inline Blobs via make.js - Legacy Support:
netstandard1.1
,net35
,netcoreapp3.1
- Nuget Repository:
using ADRaffy.ENSNormalize;
ENSNormalize.ENSIP15 // Main Library (global instance)
Primary API ENSIP15
// string -> string
// throws on invalid names
ENSNormalize.ENSIP15.Normalize("RaFFY🚴♂️.eTh"); // "raffy🚴♂.eth"
// works like Normalize()
ENSNormalize.ENSIP15.Beautify("1⃣2⃣.eth"); // "1️⃣2️⃣.eth"
Additional NormDetails (Experimental)
// works like Normalize(), throws on invalid names
// string -> NormDetails
NormDetails details = ENSNormalize.ENSIP15.NormalizeDetails("💩ì.a");
string Name; // normalized name
bool PossiblyConfusing; // if name should be carefully reviewed
HashSet<Group> Groups; // unique groups in name
HashSet<EmojiSequence> Emojis; // unique emoji in name
string GroupDescription = "Emoji+Latin"; // group summary for name
bool HasZWJEmoji; // if any emoji contain 200D
Output-based Tokenization Label
// string -> Label[]
// never throws
Label[] labels = ENSNormalize.ENSIP15.Split("💩Raffy.eth_");
// [
// Label {
// Input: [ 128169, 82, 97, 102, 102, 121 ],
// Tokens: [
// OutputToken { Codepoints: [ 128169 ], IsEmoji: true }
// OutputToken { Codepoints: [ 114, 97, 102, 102, 121 ] }
// ],
// Normalized: [ 128169, 114, 97, 102, 102, 121 ],
// Group: Group { Name: "Latin", ... }
// },
// Label {
// Input: [ 101, 116, 104, 95 ],
// Tokens: [
// OutputToken { Codepoints: [ 101, 116, 104, 95 ] }
// ],
// Error: NormException { Kind: "underscore allowed only at start" }
// }
// ]
- Group —
ENSIP15.Groups: IList<Group>
- EmojiSequence —
ENSIP15.Emojis: IList<EmojiSequence>
- Whole —
ENSIP15.Wholes: IList<Whole>
All errors are safe to print. NormException { Kind: string, Reason: string? }
is the base exception. Functions that accept names as input wrap their exceptions in InvalidLabelException { Label: string, Error: NormException }
for additional context.
"disallowed character"
— DisallowedCharacterException{ Codepoint }
"illegal mixture"
— IllegalMixtureException{ Codepoint, Group, OtherGroup? }
"whole-script confusable"
— ConfusableException{ Group, OtherGroup }
"empty label"
"duplicate non-spacing marks"
"excessive non-spacing marks"
"leading fenced"
"adjacent fenced"
"trailing fenced"
"leading combining mark"
"emoji + combining mark"
"invalid label extension"
"underscore allowed only at start"
Normalize name fragments for substring search:
// string -> string
// only throws InvalidLabelException w/DisallowedCharacterException
ENSNormalize.ENSIP15.NormalizeFragment("AB--");
ENSNormalize.ENSIP15.NormalizeFragment("..\u0300");
ENSNormalize.ENSIP15.NormalizeFragment("\u03BF\u043E");
// note: Normalize() throws on these
Construct safe strings:
// int -> string
ENSNormalize.ENSIP15.SafeCodepoint(0x303); // "◌̃"
ENSNormalize.ENSIP15.SafeCodepoint(0xFE0F); // "{FE0F}"
// IList<int> -> string
ENSNormalize.ENSIP15.SafeImplode(new int[]{ 0x303, 0xFE0F }); // "◌̃{FE0F}"
Determine if a character shouldn't be printed directly:
// ReadOnlyIntSet (like IReadOnlySet<int>)
ENSNormalize.ENSIP15.ShouldEscape.Contains(0x202E); // RIGHT-TO-LEFT OVERRIDE => true
Determine if a character is a combining mark:
// ReadOnlyIntSet
ENSNormalize.ENSIP15.CombiningMarks.Contains(0x20E3); // COMBINING ENCLOSING KEYCAP => true
Unicode Normalization Forms NF
using ADRaffy.ENSNormalize;
// string -> string
ENSNormalize.NF.NFC("\x65\u0300"); // "\xE8"
ENSNormalize.NF.NFD("\xE8"); // "\x65\u0300"
// IEnumerable<int> -> List<int>
ENSNormalize.NF.NFC(new int[]{ 0x65, 0x300 }); // [0xE8]
ENSNormalize.NF.NFD(new int[]{ 0xE8 }); // [0x65, 0x300]