Domain validator allows invalid characters in some cases
Closed this issue · 4 comments
hagenrd commented
It appears that, currently, any character is valid for the final character in the gTLD if rfc1034
is True
, for example:
>>> domain('example.com?', rfc_1034=True, rfc_2782=False)
True
>>> domain('example.com!', rfc_1034=True, rfc_2782=False)
True
I believe the '.' just needs to be escaped in the pattern string (link):
+ rf"[a-z]{r'.?$' if rfc_1034 else r'$'}",
^
Also, it appears question marks are allowed when rfc_2782
is True
for domain validation:
>>> from validators import domain
>>> domain('example?.com', rfc_1034=False, rfc_2782=True)
True
This appears to be from the use of '?' after the '_' inside of a character class:
rf"^(?:[a-z0-9{r'_?'if rfc_2782 else ''}]"
^
Presumably, this is to make the '_' optional, but since metacharacters aren't active in character classes (link), this is interpreted as a literal '?' instead.
hagenrd commented
Thanks for the quick turnaround! Any chance you have some time to provide a patch that includes those changes?
yozachar commented
hagenrd commented
Sorry, I meant patch in terms of a semantic version, i.e. a patch-release (0.28.1).