Excel regular expression add-in using .NET regex engine and ExcelDNA integration.
- 1.3.1 MAZ 2020-02-10
- Changed the handling of the
input
text parameter: empty input strings do not trigger a #VALUE error anymore,null
still does, though ExcelDNA doesn't seem to pass any nulls.
- Changed the handling of the
- 1.3 MAZ 2020-02-03
- Added
RegexGroupMatches()
- Tagged all the functions as thread-safe, improving performance.
- Fixed handling of optional parms, such as IncludeDuplicates, which did not handle defaults correctly
- Added
- 1.2 MAZ 2020-02-02
- Added
RegexMatches()
- Fixed
RegexMatch()
return value when not matched (did not return #NA correctly)
- Added
- 1.1 MAZ 2020-02-02
- Added
IsRegexMatch()
,RegexEscape()
,RegexMatchGroups()
- Few minor fixes
- Added
- 1.0 MAZ 2020-01-29
- Released with
RegexMatch()
andRegexReplace()
- Implemented Excel Intellisense functionality using decorations
- ToDo:
- Add thread-safe ExcelDNA registration. This assumes we never have a Regex.Match object jump threads.
- Consider for later:
- V1 function calls from inside Excel are now pretty fast, but consider memoization for the future
- Released with
ExcelMAZRegez is a simple, fast .NET regular expression library for Excel. As of v1 it's only for Excel formulas inside a worksheet; a later version might implement it for use inside VBA.
It's several orders of magnitude faster than using the VBA scripting library, and it's less convoluted than the few other Excel .NET regex libraries I found out there.
See the .Net regular expression documentation for a full description of regexes, including search and replacement patterns, and how options work.
The formula use and syntax show up in the Excel Intellisense UI, and below for reference:
- Download the
\*-packed.xll
andExcelDna.IntelliSense.dll
files from the project repository Releases page - Copy the appropriate
\*.xll
add-in file (32 or 64 bit) to your Excel add-ins folder (%APPDATA%\Microsoft\AddIns
on Windows). - Copy the
ExcelDna.IntelliSense.dll
file into the add-ins folder. - Turn on the add-in in the Excel manage addins dialog (Alt+t,i).
RegexEscape( text )
Return the input text with special characters escaped. Useful to construct regex patterns with arbitrary text that will not be interpreted for special pattern interpretations, such as when a string includes "[", "$", etc.
RegexMatch( input, pattern [, options [, replacement ] ] )
Finds and returns the text of the first instance of the regular expression pattern inside the input string, optionally modified with the option flags, and optionally with a replacement pattern.
The options
are bit flags (see below), and sould be added up to specify more than one optoin. E.g.: ignore case plus multilines is 3 (1 + 2).
- 1 = IgnoreCase
- 2 = Multiline
- 4 = ExplicitCapture
- 8 = Compiled
- 16 = Singleline
- 32 = IgnorePatternWhitespace
- 64 = RightToLeft
- 256 = ECMAScript
- 512 = CultureInvariant
if not specified, the replacement patterns defaults to "$0".
- First instance of text in input that conforms to the input pattern and options, optionally modified by a replacement pattern
- #VALUE error if the input or pattern are empty strings
- #NA error if the pattern is not found
RegexMatches( input, pattern [, options [, replacement ] ] )
Finds all the occurrences of the pattern in the input. Returns delimiter-separated list of matches with optional replacement pattern.
- Same as
RegexMatch()
, plus... delimiter
: Delimiter for the list of results, default ','
- String with delimiter-separated list of matches found, optionally modified by replacement pattern.
IsRegexMatch( input, pattern [, options ] )
- Same as
RegexMatch()
, except forreplacement
, which is not used.
TRUE if the pattern is found in the input, FALSE otherwise.
RegexMatchGroups( input, pattern [, options [, MaxMatches [, MaxGroups [, IncludeDuplicates ] ] ] ] )
Search the input for matches of the pattern, return a comma delimited list of matching capture group names/numbers in match order.
Useful to find out what chunks of a regular expression were matched against, without actually caring what the text that matched was. I personally use it a lot to label data, see the examples below.
- The
input
,pattern
, andoptions
parameters are the same as forRegexMatch()
. MaxMatches
: Maximum number of matches to execute on the input (omit or 0 to return all matches)MaxGroups
: Maximum number of group names or numbers to return for each match (omit or 0 for all groups)IncludeDuplicates
: Default TRUE: Print group names every time they're found in a match. FALSE: Only return the first instance of each capture group.
- If the pattern has no capture groups, we return group number "0" for each qualifying match.
- The groups within each match don't show in the same order as in the pattern, first come all the numbered (unnamed) groups, then all the named ones, so caveat emptor: don't assume the order of the groups inside each match is in pattern appearance order, nor input match index order. See https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions?view=netframework-4.8#grouping-constructs-and-regular-expression-objects
The formula:
=RegexMatchGroups('liliac,red,mauve,green','<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)'
)- returns:
artsy,primary,artsy,primary
To only return the first match, you'd use:
=RegexMatchGroups('mauve,red,green','<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)'
,,1)- returns:
artsy
To return all matches but not repeat group names, you set IncludeDuplicates=FALSE
:
=RegexMatchGroups('mauve,red,green','<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)'
,,,,FALSE)- returns:
artsy,primary
The MaxGroups
sets the max number of groups per match. Handy, for example, when different subexpressions may match the same text, and you only care for the first group that does. For example, see the difference between not using MaxGroups
:
=RegexMatchGroups('mauve,red,green','(?<funky>green|lilac)<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)'
)- returns:
artsy,primary,funky,primary
... and setting MaxGroups=1
:
=RegexMatchGroups('mauve,red,green','(?<funky>green|lilac)<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)'
,,,1)- returns:
artsy,primary,funky
RegexMatchGroups( input, pattern [, options [, MaxMatches [, MaxGroups [, IncludeDuplicates [, GroupNamesTransformPattern [, GroupNamesTransformReplacement ] ] ] ] ] ] )
Search the input for matches of the pattern, return a comma delimited list of matching capture group names/numbers in capture group order within the pattern.
The difference between RegexMatchGroups()
and RegexGroupMatches()
is that the first reports the results in the order of matches in the input, while the latter reports the results in the order of the search pattern. Besides, as a convenience, it allows to do a match/replace on the returned group names.
- The
input
,pattern
,options
,MaxMatches
, andIncludeDuplicates
parameters are the same as forRegexMatchGroups()
. MaxGroups
: Maximum total number of group names or numbers to return (omit or 0 for all groups).GroupNamesTransformPattern
: Transform group names on output list, regex pattern for searchGroupNamesTransformReplacement
: Transform group names on output list, regex replacement pattern
- This function is useful to check for multiple pattern matches using a single regular expression, give each pattern a name in a capture group, and return the names of the groups that match, more importantly: in the order of the definition of the capture groups. This makes it a very succint mechanism to tag data with arbitrary precedence rules.
To Do: Include examples from CalIns project showing rule precedence
RegexReplace( input, pattern [, options [, replacement ] ] )
Finds all the instances of the search pattern in the input text, optionally modified with the option flags, replaces them with the replacement pattern, and returns the modifed input. Similar to RegexMatch()
but searching and replacing the entire input, and the replacement
pattern default is an empty string ("").
The parameters, and options for RegexReplace() are the same as for RegexMatch(), except for the replacement
default.
- The input text with every instance of the search pattern + options replaced by the replacement pattern
- #VALUE error if the input or pattern are empty strings
- If you ever get a #NUM error, it's an internal error and shouldn't happen. Please raise an issue if it does, with replicable example(s).