daveoncode/python-string-utils

Email validation- constraints on domain label and presence of unicode unhandled

devikasondhi opened this issue · 1 comments

Hello,

I'm listing some scenarios where the is_email fails:

  1. domain with localhost not accepted by is_email: email@localhost, email@[127.0.0.1] are valid while the function returns False
  2. unicode not handled- this should be valid but returns false: test@domain.with.idn.tld.\\xe0\\xa4\\x89\\xe0\\xa4\\xa6\\xe0\\xa4\\xbe\\xe0\\xa4\\xb9\\xe0\\xa4\\xb0\\xe0\\xa4\\xa3.\\xe0\\xa4\\xaa\\xe0\\xa4\\xb0\\xe0\\xa5\\x80\\xe0\\xa4\\x95\\xe0\\xa5\\x8d\\xe0\\xa4\\xb7\\xe0\\xa4\\xbe
  3. domain labels can't begin or end in hyphens '-': These should be invalid but is_email gives true: example@invalid-.com and example@-invalid.com

Also, local part can contain ascii characters like !'/ (https://en.wikipedia.org/wiki/Email_address). This is not handled well by is_email for input joe!/blow@apache.org.
Further, is_email gives False for input abc@school.school. It seems there is a limit on the length of last domain label (not accepting longer than 4).