bbottema/simple-java-mail

Javadoc regarding default setting for email validation contradicting code, which is correct?

bbottema opened this issue · 2 comments

8370db1#commitcomment-16375443

The following javadoc and code show what the default email validation strictness is set to:

    /**
     * The default setting is not strictly 2822 compliant. For example, it does not include the {@link #ALLOW_DOMAIN_LITERALS} criteria, which results in
     * exclusions on single domains.
     * <p>
     * Included in the defaults are: <ul> <li>{@link #ALLOW_QUOTED_IDENTIFIERS}</li> <li>{@link #ALLOW_PARENS_IN_LOCALPART}</li> </ul>
     */
    public static final EnumSet<EmailAddressCriteria> DEFAULT = of(ALLOW_DOMAIN_LITERALS);

However, I'm not sure what actually should be the default. Do we even need a default? What is its purpose?

Initially I thought a more strict-than-RFC-compliant default would be needed to make sure main stream services and servers can handle the more mundane email strings, rather than the exotic strings the RFC would allow.

What should be the default?

Don't know if this helps, but:

I'm not sure what your use cases are for simple-java-mail, but the original options in EmailAddress were there to cover a few basic use cases:

  1. user wants to scrape as much data from a possibly-ugly address as they can and make a sensible address from it; these users typically allow all kinds of addresses (except perhaps for single-domain addresses) because in the wild, legitimate senders often violate 2822. E.g. If your goal is to parse spammy emails for analysis, you may want to allow every variation out there just so you can parse something useful.

  2. user wants to check to see if an email address is of proper, normal syntax; e.g. checking the value entered in a form. These users typically make everything strict, since what most people consider a "valid" email address is a drastic subset of 2822. For users with the strictest requirements, EmailAddress may not be the best for this use, since it might be too 'tolerant' for their needs. (Most people use a simple blah@blah.blah.com type regex, which as we of course know is rarely good idea either: http://www.troyhunt.com/2013/11/dont-trust-net-web-forms-email-regex.html )

  3. user wants to intelligently parse a possibly-ugly address with the goal being a cleaned-up usable address that other software (MTAs, databases, whatever) can use/parse without breaking; sounds like this is maybe your use case? If so, the defaults specified at https://www.boxbe.com/freebox/jdoc/com/boxbe/pub/email/EmailAddress.html are what made sense to me (with the possible exception of ALLOW_DOT_IN_ATEXT, to taste.) In our experience they allowed "real" addresses the highest percentage of the time, and the addresses they failed on were almost all ridiculous.

Again, not sure if this is what you were asking, but maybe it's useful.

@chconnor I included parts of your comment in the code to clarify the defaults and the use cases. That's good enough for me.