lorey/social-media-profiles-regexs

RegEx ASCII Special Characters

albertopfunk opened this issue · 2 comments

As I was testing some patterns, I noticed that you used 'A-z' for alphanumeric characters. When you do that, it will match any character between 65-122 ASCII, that includes lowercase, uppercase, and a few special characters, including underscores. To fix that and only include characters a-z(97-122) and A-Z(65-90), you need to change 'A-z' to 'a-zA-Z'

lorey commented

Hey Alberto, thanks for reaching out.

This is mostly done on purpose because of laziness. These regular expressions are for crawling/scraping, not for validation. And for scraping a higher recall is more important than precision IMHO. So since I cannot test all special characters, I tend to use a superset of possible characters to match most of the links.

What do you think?

gotcha that makes sense, I was using them for validation, apologies for that misunderstanding. I ended up your examples as a starting point for some I created, very helpful, thank you!