maldevel/EmailHarvester

Imcomplete email addresses returned if incomplete domain provided

Closed this issue · 2 comments

For instance, I ran
python2.7 EmailHarvester.py -d ipno.in2p3 -e all # domain lacks ".fr"

All the results look like:
PDFdosim@ipno.in2p3 # also lacks final ".fr"

Seems the regex for email extraction just trusts the domain provided instead of looking for valid email addresses matching the domain. Its a pitty since the search seems to works and return relevant (but incomplete) results...

if you search for "@mydomain.c" and not "@mydomain.com" in google or yahoo or any search engine, you will get results about this exact string "@mydomain.c" not @mydomain.com. the only fix i can do is:

  • test if the user provided a valid domain
  • the email addresses that has been found are truly valid email addresses before adding them to the results list.

Yes you're right I'm not sure which is the best solution. I would be a pity to reject some email addresses that were found.... but, at the same time, for the sake of "correctness", for a tool that is supposed to return email addresses, it would be cool to return only valid email addresses...
Maybe that could be an option? Like --strict would return only valid email addresses. Also a quick check at the domain entered as input would raise a warning and automatically switch to "non-strict" mode?