/ExcelMazRegex

Excel regular expressions add-in using ExcelDNA 1.0

Primary LanguageC#MIT LicenseMIT

ExcelMazRegex

Excel regular expression add-in using .NET regex engine and ExcelDNA integration.

Version History:

  • 1.3.1 MAZ 2020-02-10
    • Changed the handling of the input text parameter: empty input strings do not trigger a #VALUE error anymore, null still does, though ExcelDNA doesn't seem to pass any nulls.
  • 1.3 MAZ 2020-02-03
    • Added RegexGroupMatches()
    • Tagged all the functions as thread-safe, improving performance.
    • Fixed handling of optional parms, such as IncludeDuplicates, which did not handle defaults correctly
  • 1.2 MAZ 2020-02-02
    • Added RegexMatches()
    • Fixed RegexMatch() return value when not matched (did not return #NA correctly)
  • 1.1 MAZ 2020-02-02
    • Added IsRegexMatch(), RegexEscape(), RegexMatchGroups()
    • Few minor fixes
  • 1.0 MAZ 2020-01-29
    • Released with RegexMatch() and RegexReplace()
    • Implemented Excel Intellisense functionality using decorations
    • ToDo:
      • Add thread-safe ExcelDNA registration. This assumes we never have a Regex.Match object jump threads.
    • Consider for later:
      • V1 function calls from inside Excel are now pretty fast, but consider memoization for the future

Overview

ExcelMAZRegez is a simple, fast .NET regular expression library for Excel. As of v1 it's only for Excel formulas inside a worksheet; a later version might implement it for use inside VBA.

It's several orders of magnitude faster than using the VBA scripting library, and it's less convoluted than the few other Excel .NET regex libraries I found out there.

See the .Net regular expression documentation for a full description of regexes, including search and replacement patterns, and how options work.

The formula use and syntax show up in the Excel Intellisense UI, and below for reference:

Installation:

  • Download the \*-packed.xll and ExcelDna.IntelliSense.dll files from the project repository Releases page
  • Copy the appropriate \*.xll add-in file (32 or 64 bit) to your Excel add-ins folder (%APPDATA%\Microsoft\AddIns on Windows).
  • Copy the ExcelDna.IntelliSense.dll file into the add-ins folder.
  • Turn on the add-in in the Excel manage addins dialog (Alt+t,i).

Documentation:

Function RegexEscape()

RegexEscape( text )

Return the input text with special characters escaped. Useful to construct regex patterns with arbitrary text that will not be interpreted for special pattern interpretations, such as when a string includes "[", "$", etc.

Function RegexMatch()

RegexMatch( input, pattern [, options [, replacement ] ] )

Finds and returns the text of the first instance of the regular expression pattern inside the input string, optionally modified with the option flags, and optionally with a replacement pattern.

Parameters

The options are bit flags (see below), and sould be added up to specify more than one optoin. E.g.: ignore case plus multilines is 3 (1 + 2).

  • 1 = IgnoreCase
  • 2 = Multiline
  • 4 = ExplicitCapture
  • 8 = Compiled
  • 16 = Singleline
  • 32 = IgnorePatternWhitespace
  • 64 = RightToLeft
  • 256 = ECMAScript
  • 512 = CultureInvariant

if not specified, the replacement patterns defaults to "$0".

Returns:

  • First instance of text in input that conforms to the input pattern and options, optionally modified by a replacement pattern
  • #VALUE error if the input or pattern are empty strings
  • #NA error if the pattern is not found

Function RegexMatches()

RegexMatches( input, pattern [, options [, replacement ] ] )

Finds all the occurrences of the pattern in the input. Returns delimiter-separated list of matches with optional replacement pattern.

Parameters

  • Same as RegexMatch(), plus...
  • delimiter: Delimiter for the list of results, default ','

Returns:

  • String with delimiter-separated list of matches found, optionally modified by replacement pattern.

Function IsRegexMatch()

IsRegexMatch( input, pattern [, options ] )

Parameters

  • Same as RegexMatch(), except for replacement, which is not used.

Returns:

TRUE if the pattern is found in the input, FALSE otherwise.

Function RegexMatchGroups()

RegexMatchGroups( input, pattern [, options [, MaxMatches [, MaxGroups [, IncludeDuplicates ] ] ] ] )

Search the input for matches of the pattern, return a comma delimited list of matching capture group names/numbers in match order.

Useful to find out what chunks of a regular expression were matched against, without actually caring what the text that matched was. I personally use it a lot to label data, see the examples below.

Parameters:

  • The input, pattern, and options parameters are the same as for RegexMatch().
  • MaxMatches: Maximum number of matches to execute on the input (omit or 0 to return all matches)
  • MaxGroups: Maximum number of group names or numbers to return for each match (omit or 0 for all groups)
  • IncludeDuplicates: Default TRUE: Print group names every time they're found in a match. FALSE: Only return the first instance of each capture group.

Notes:

Examples:

The formula:

  • =RegexMatchGroups('liliac,red,mauve,green','<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)')
  • returns: artsy,primary,artsy,primary

To only return the first match, you'd use:

  • =RegexMatchGroups('mauve,red,green','<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)',,1)
  • returns: artsy

To return all matches but not repeat group names, you set IncludeDuplicates=FALSE:

  • =RegexMatchGroups('mauve,red,green','<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)',,,,FALSE)
  • returns: artsy,primary

The MaxGroups sets the max number of groups per match. Handy, for example, when different subexpressions may match the same text, and you only care for the first group that does. For example, see the difference between not using MaxGroups:

  • =RegexMatchGroups('mauve,red,green','(?<funky>green|lilac)<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)')
  • returns: artsy,primary,funky,primary

... and setting MaxGroups=1:

  • =RegexMatchGroups('mauve,red,green','(?<funky>green|lilac)<?<primary>red|green|blue)(?<artsy>mauve|lilac|haze)',,,1)
  • returns: artsy,primary,funky

Function RegexGroupMatches()

RegexMatchGroups( input, pattern [, options [, MaxMatches [, MaxGroups [, IncludeDuplicates [, GroupNamesTransformPattern [, GroupNamesTransformReplacement ] ] ] ] ] ] )

Search the input for matches of the pattern, return a comma delimited list of matching capture group names/numbers in capture group order within the pattern.

The difference between RegexMatchGroups() and RegexGroupMatches() is that the first reports the results in the order of matches in the input, while the latter reports the results in the order of the search pattern. Besides, as a convenience, it allows to do a match/replace on the returned group names.

Parameters

  • The input, pattern, options, MaxMatches, and IncludeDuplicates parameters are the same as for RegexMatchGroups().
  • MaxGroups: Maximum total number of group names or numbers to return (omit or 0 for all groups).
  • GroupNamesTransformPattern: Transform group names on output list, regex pattern for search
  • GroupNamesTransformReplacement: Transform group names on output list, regex replacement pattern

Notes

  • This function is useful to check for multiple pattern matches using a single regular expression, give each pattern a name in a capture group, and return the names of the groups that match, more importantly: in the order of the definition of the capture groups. This makes it a very succint mechanism to tag data with arbitrary precedence rules.

Examples

To Do: Include examples from CalIns project showing rule precedence

Function RegexReplace()

RegexReplace( input, pattern [, options [, replacement ] ] )

Finds all the instances of the search pattern in the input text, optionally modified with the option flags, replaces them with the replacement pattern, and returns the modifed input. Similar to RegexMatch() but searching and replacing the entire input, and the replacement pattern default is an empty string ("").

The parameters, and options for RegexReplace() are the same as for RegexMatch(), except for the replacement default.

Returns:

  • The input text with every instance of the search pattern + options replaced by the replacement pattern
  • #VALUE error if the input or pattern are empty strings

General notes:

  • If you ever get a #NUM error, it's an internal error and shouldn't happen. Please raise an issue if it does, with replicable example(s).