This library provides a human-friendly way to write complex regular expression patterns.
The patterns generated by this library always work with Unicode code points, even if the target is JavaScript or C#.
Php | Python | Java{6/7}/JVM | C# | Js/Node/ES{5/6} | Interp | Neko | HashLink | Lua | CPP | Flash |
---|---|---|---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ➖ | ➖ | ➖ |
var pattern1 = (RxPattern.Char("a") | RxPattern.Char("b")).many();
var rx1:EReg = pattern1.build();
rx1.match("abaab"); // => true
var pattern2 = RxPattern.String("gr")
>> (RxPattern.Char("a") | RxPattern.Char("e"))
>> RxPattern.String("y");
var rx2 = pattern2.build();
rx2.match("grey"); // => true
rx2.match("gray"); // => true
var pattern3 = RxPattern.AtStart
>> RxPattern.String("colo")
>> RxPattern.Char("u").option()
>> RxPattern.String("r")
>> RxPattern.AtEnd;
var rx3 = pattern3.build();
rx3.match("color"); // => true
rx3.match("colour"); // => true
rx3.match("color\n"); // => false
rx3.match("\ncolour"); // => false
var wordStart = Category.L | RxPattern.Char("_");
var wordChar = wordStart | Category.N;
var word = wordStart >> wordChar.many();
var pattern4 = RxPattern.AtStart >> word >> RxPattern.AtEnd;
var rx4 = pattern4.build();
rx4.match("function"); // => true
rx4.match("int32_t"); // => true
rx4.match("\u3042"); // => true
rx4.match("24hours"); // => false
This library provides the following classes:
rxpattern.RxPattern
rxpattern.CharSet
rxpattern.Category
RxPattern.AnyCodePoint:RxPattern
- Matches any Unicode code point, i.e. U+0000 to U+10FFFF (may or may not excluding surrogates).
RxPattern.Char(c:String):RxPattern
- Matches a Unicode code point represented by the string
c
. c
must consist of a single code point.
- Matches a Unicode code point represented by the string
RxPattern.String(s:String):RxPattern
- Matches a string.
- Special characters are escaped.
RxPattern.LineTerminator:RxPattern
- Matches a line terminator.
- The following sequence / characters are treated as a line terminator:
- CR LF
- CR
- LF
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
- TODO: Also include U+0085 NEL?
RxPattern.Empty:RxPattern
- Matches an empty string.
The variables pattern1
and pattern2
are of type RxPattern
.
pattern1 >> pattern2
- Matches the sequence of
pattern
s.
- Matches the sequence of
pattern1 | pattern2
- Matches
pattern1
orpattern2
.
- Matches
pattern1.then(pattern2):RxPattern
- Same as
pattern1 >> pattern2
.
- Same as
pattern1.or(pattern2):RxPattern
- Same as
pattern1 | pattern2
.
- Same as
RxPattern.sequence(patterns:Iterable<RxPattern>):RxPattern
- Applies
>>
to the elements ofpatterns
. - Returns
RxPattern.Empty
ifpatterns
is empty.
- Applies
RxPattern.choice(patterns:Iterable<RxPattern>):RxPattern
- Applies
|
to the elements ofpatterns
. - Returns
RxPattern.Never
ifpatterns
is empty.
- Applies
The variable pattern
is of type RxPattern
.
pattern.option():RxPattern
- Matches
pattern
or an empty string. - Equivalent to
pattern | RxPattern.Empty
.
- Matches
pattern.many():RxPattern
- Matches zero or more repetition of
pattern
.
- Matches zero or more repetition of
pattern.many1():RxPattern
- Matches one or more repetition of
pattern
.
- Matches one or more repetition of
TODO: Add methods for the quantifiers {m}
, {m,}
and {m,n}
.
RxPattern.AtStart:RxPattern
- Matches at the start of the string.
RxPattern.AtEnd:RxPattern
- Matches at the end of the string.
RxPattern.LookAhead(pattern:RxPattern):RxPattern
- Positive look ahead.
RxPattern.NotFollowedBy(pattern:RxPattern):RxPattern
- Negative look ahead.
RxPattern.Never:RxPattern
- Never matches anything.
- Equivalent to
RxPattern.NotFollowedBy(RxPattern.Empty)
.
RxPattern.Group(pattern:RxPattern):RxPattern
- Creates a capture group.
Since non-capturing groups are automatically created when necessary, there is no function to explicitly create them.
The variable pattern
is of type RxPattern
.
pattern.build(options = "u"):EReg
- Build an
EReg
object withpattern
.
- Build an
pattern.get():String
- Get the pattern string.
RxPattern.buildEReg(pattern:RxPattern, options = "u"):EReg
- Same as
pattern.build(options)
- Same as
RxPattern.getPattern(pattern:RxPattern):String
- Same as
pattern.get()
- Same as
The variable charset
is of type CharSet
.
RxPattern.CharSet(set:CharSet):RxPattern
RxPattern.NotInSet(set:CharSet):RxPattern
CharSet.empty():CharSet
- Returns an empty character set.
CharSet.singleton(c:String):CharSet
- Returns a character set with one element
c
.
- Returns a character set with one element
CharSet.fromString(s:String):CharSet
- Returns a character set with elements from the string
s
.
- Returns a character set with elements from the string
CharSet.intersection(a:CharSet, b:CharSet):CharSet
CharSet.union(a:CharSet, b:CharSet):CharSet
CharSet.difference(a:CharSet, b:CharSet):CharSet
charset.has(c:String):Bool
- The string
c
must consist of a single code point.
- The string
charset.add(c:String):Void
- The string
c
must consist of a single code point.
- The string
charset.remove(c:String):Void
- The string
c
must consist of a single code point.
- The string
charset.hasCodePoint(x:Int):Bool
charset.addCodePoint(x:Int):Void
charset.removeCodePoint(x:Int):Void
charset.codePointIterator():Iterator<Int>
charset.length:Int
This library provides RxPattern
values corresponding Unicode general categories.
If Unicode properties (or, \p{}
patterns) are available, they are used.
Otherwise, patterns generated from the latest supported Unicode data in skial/seri
will be used.
abstract Category {
var Cc;
var Cf;
var Co;
var Cs;
var C;
var Ll;
var Lm;
var Lo;
var Lt;
var Lu;
var L;
var Mc;
var Me;
var Mn;
var M;
var Nd;
var Nl;
var No;
var N;
var Pc;
var Pd;
var Pe;
var Pf;
var Pi;
var Po;
var Ps;
var P;
var Sc;
var Sk;
var Sm;
var So;
var S;
var Zl;
var Zp;
var Zs;
var Z;
var Letter_UpperCase;
var Letter_LowerCase;
var Letter_TitleCase;
var Letter_Modifier;
var Mark_NonSpacing;
var Mark_SpacingCombining;
var Mark_Enclosing;
var Number_DecimalDigit;
var Number_Letter;
var Number_Other;
var Punctuation_Connector;
var Punctuation_Dash;
var Punctuation_Open;
var Punctuation_Close;
var Punctuation_InitialQuote;
var Punctuation_FinalQuote;
var Punctuation_Other;
var Symbol_Math;
var Symbol_Currency;
var Symbol_Modifier;
var Symbol_Other;
var Separator_Space;
var Separator_Line;
var Separator_Paragraph;
var Other_Control;
var Other_Format;
var Other_Surrogate;
var Other_PrivateUse;
}
The terms "Disjunction", "Alternative", "Term" and "Atom" correspond to the rules in Typical Regular Expression Syntax.
The variable pattern
is of type RxPattern
.
RxPattern.Disjunction(s:String):RxPattern
- Returns a
RxPattern
value with given pattern string.
- Returns a
RxPattern.Alternative(s:String):RxPattern
- Returns a
RxPattern
value with given pattern string. - The string
s
must be able to be used as an Alternative: that is, the patterns + "a"
matchess
and the charactera
.
- Returns a
RxPattern.Term(s:String):RxPattern
- Returns a
RxPattern
value with given pattern string. - The string
s
must be able to be used as a Term.
- Returns a
RxPattern.Atom(s:String):RxPattern
- Returns a
RxPattern
value with given pattern string. - The string
s
must be able to be used as an Atom: that is, the patterns + "*"
does mean zero or more repetition ofs
.
- Returns a
pattern.toDisjunction():String
- Returns a pattern string that can be used as a Disjunction. This is same as
pattern.get()
.
- Returns a pattern string that can be used as a Disjunction. This is same as
pattern.toAlternative():String
- Return a pattern string that can be used as an Alternative.
- The string is surrounded by a non-capturing group if necessary.
pattern.toTerm():String
- Return a pattern string that can be used as a Term.
- The string is surrounded by a non-capturing group if necessary.
pattern.toAtom():String
- Return a pattern string that can be used as an Atom.
- The string is surrounded by a non-capturing group if necessary.
Pattern ::= Disjunction
Disjunction ::= Alternative
| Alternative "|" Disjunction
Alternative ::= ""
| Alternative Term
Term ::= Assertion
| Atom
| Atom Quantifier
Assertion ::= "^" | "$"
| "(?=" Disjunction ")"
| "(?!" Disjunction ")"
Quantifier ::= "*" | "+" | "?"
Atom ::= PatternCharacter
| "\" AtomEscape
| CharacterClass
| "(" Disjunction ")"
| "(?:" Disjunction ")"
The file/library regenerate.js
is created by @mathiasbynens. Core functionality was ported to Haxe, see utf16/RangeUtil.hx
.