Cretezy/linkify

Email RegExp should be simpler.

komapeb opened this issue ยท 7 comments

Currently, the only way to know if an email address is valid is to send an email address (and potentially wait for an action, like clicking a link to validate, etc.)

tl;dr

Please use this RegExp:

r'.+@.+'

(No need for mailto: too)

Boring stuff below

Some examples for perfectly valid email addresses:

criscrisaaaa@gmail.com.es
mminighin@alpenite.com
!mminighin@alpenite.com
#mminighin@alpenite.com
$mminighin@alpenite.com
%mminighin@alpenite.com
&mminighin@alpenite.com
'mminighin@alpenite.com
*mminighin@alpenite.com
+mminighin@alpenite.com
-mminighin@alpenite.com
/mminighin@alpenite.com
=mminighin@alpenite.com
?mminighin@alpenite.com
^mminighin@alpenite.com
_mminighin@alpenite.com
`mminighin@alpenite.com
{mminighin@alpenite.com
|mminighin@alpenite.com
}mminighin@alpenite.com
~mminighin@alpenite.com
0mminighin@alpenite.com
1mminighin@alpenite.com
2mminighin@alpenite.com
3mminighin@alpenite.com
4mminighin@alpenite.com
5mminighin@alpenite.com
6mminighin@alpenite.com
7mminighin@alpenite.com
8mminighin@alpenite.com
9mminighin@alpenite.com
10mminighin@alpenite.com
prettyandsimple@example.com
very.common@example.com
someuser@ai.
disposable.style.email.with+symbol@example.com
other.email-with-dash@example.com
fully-qualified-domain@example.com
user.name+tag+sorting@example.com
x@example.com
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com
example-indeed@strange-example.com
admin@mailserver1
#!$%&'*+-/=?^_`{}|~@example.org
"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"@example.org
example@s.solutions
user@localserver
user@[2001:DB8::1]

These addresses above are all valid!

If you really, really want to be kinda (covers 99.99% of the cases) compatible with some of the RFCs, you can use this RegExp (I use it in production, but lately considering to drop it):

r'^([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22))*\x40([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d))*\.?$'

More info:

I agree with you. I will simplify the regex in the next release

Cool, thank you! Just a simple note - on a second thought, mailto: should be included to prevent accidental link conversions. Or better yet, maybe let users of the package specify their own RegExp and just provide defaults for convenience. Something like:

List<LinkifyElement> linkify(
  String text, {
  bool humanize,
  List<LinkType> linkTypes,
  RegExp urlRegex,
  RegExp emailRegex,
}) {
  ...
}

Then just check if the arguments are non-null and assign where applicable.

Yeah, I'm slowly working on implementing custom regexes. Haven't had much time to dedicate to this project recently, I'll try to get to it soon!

Custom linkifiers are out now! You can replace the whole email parser if you'd like.

I ran into an issue with the email regex included in this library a while ago so I had to run my own regex. I don't remember exactly what the issue was but I think it was interfering with the URL regex. There were some cases where the email was ignored and only the domain portion after the "@" was parsed.

Here's the Regex I'm using that works well for me.
const emailPattern = r"\b[\w\.-]+@[\w\.-]+\.\w{2,4}\b";

Just ran an edge case too - For us@dara.network, only us@dara.netw gets linkified.

@devxpy #36 was just merged with a fix for this. Will be included in the next release.

If anyone wants more flexible email parsing, please open a PR (I'm quite limited on time).