aim42/htmlSanityCheck

"Illegal characters" warning appears to be spurious

Closed this issue · 3 comments

When running on a document that contains

You may wish to join the
<a href="https://groups.google.com/forum/#!forum/randoop-discuss">mailing
list</a>.

I get the warning:

205 href checked, 1 missing id found.
link "https://groups.google.com/forum/#!forum/randoop-discuss" contains illegal characters (Suggestions: optiongroup:Logging,-notifications,-and-troubleshooting-Randoop)

This URL does work in a browser, and quoting the characters (such as
https://groups.google.com/forum/%23!forum/randoop-discuss) produces a URL that does not work. So it seems that htmlSanityCheck should not issue this warning.

Some suggestions:

  • The error message should indicate exactly which character(s) are illegal.
  • It would be nice to give a quoting suggestion.
  • If there is a bug and this warning should not be produced, then it would be nice to fix that so that I don't always see this spurious error.
  • If the bug cannot be fixed, please provide a way to suppress/ignore specific errors, so that my build runs cleanly.

Thanks!

Michael, thx for the error report.

The current check is really naive, implemented in

 URLUtil: public static boolean containsInvalidChars(String aLink) {

by a (too) simple RegEx

As a simple (and known-to-be-imperfect) fix I modify this Regex to let "!" pass as legal character.

Will include a regression test in URLUtilSpec.groovy.

@mernst - I commited and pushed version 1.1.1 to the plugin portal. Please upgrade your build configuration ...

That works! Thank you very much.