sandialabs/Arcus

[BUG] MAC address regular expression is wrong

rheone opened this issue · 2 comments

Describe the bug

the RegEx

^(?:[0-9A-Fa-f]{2}([-: ]?))(?:[0-9A-Fa-f]{2}\1){4}[0-9A-Fa-f]{2}$|^(?:[0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4}$|^(?:[0-9A-Fa-f]{3}\.){3}[0-9A-Fa-f]{3}$

was (poorly) optimized to

^[\dA-F]{2}([ -:]?)(?:[\dA-F]{2}\1){4}[\dA-F]{2}$|^(?:[\dA-F]{4}\.){2}[\dA-F]{4}$|^(?:[\dA-F]{3}\.){3}[\dA-F]{3}$`

RegEx to Strings
regexp-tree

It is matching things like

48%57%82%1A%3A%E5
DA8698328B383C858
8F+23+1F+7D+C2+D8
7D6426E367C6F163A
65)E9)E0)5C)21)51
A9&4A&8D&E1&6A&48
200F70CB0250260CD
8F36E30A34936732D
C2$06$7C$6C$7F$0B

Expected behavior

It should match (and only match)

I’ve reviewed common formats and standards and I believe the following regex to be correct for the following formats

  • IEEE 802 format for printing MAC-48 addresses in six groups of two hexadecimal digits, separated by hyphens -
  • six groups of two hexadecimal digits separated by colons :
  • six groups of two hexadecimal digits separated by spaces
  • 12 hexadecimal digits with no separation
  • Cisco three groups of four hexadecimal digits separated by dots .
  • Cisco four groups of three hexadecimal digits separated by dots .

Bonus points if it groups the Hexadecimal Digits

More info may be found on Network Engineering Stackexchange What are the various standard and industry practice ways to express a 48-bit MAC address?

Proposed solutions

I'll figure something out

Can you help?

I'll figure something out

A fix has been in place as of commit 1ddd6b1
Using the original non-"optimized"

^(?:[0-9A-F]{2}([-: ]?))(?:[0-9A-F]{2}\1){4}[0-9A-F]{2}$|^(?:[0-9A-F]{4}\.){2}[0-9A-F]{4}$|^(?:[0-9A-F]{3}\.){3}[0-9A-F]{3}$

I'd like to take some time to see if this RegEx can be optimized, add further testing, and determine if better grouping can be satisfied.

inadvertently left open