lgraubner/sitemap-generator-cli

Parsing Errors

ataylor32 opened this issue · 3 comments

Here is my test HTML (assume that https://example.com/john's.html is a valid URL):

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Test</title>
  </head>
  <body>
    <p><a href="/john&#39;s.html">Link with encoded apostrophe</a></p>
    <iframe src="https://www.facebook.com/plugins/like.php?href=https%3A%2F%2Fgithub.com%2F&amp;send=false&amp;layout=standard&amp;width=450&amp;show_faces=false&amp;action=like&amp;colorscheme=light&amp;font&amp;height=35" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:35px;" allowTransparency="true"></iframe>
  </body>
</html>

This is the output I get:

Found: https://example.com/
Not found: https://example.com/john&
Not found: https://example.com/https:%2F%2Fgithub.com%2F&send=false&layout=standard&width=450&show_faces=false&action=like&colorscheme=light&font&height=35%22
Added 1 sites, encountered 2 errors.
Sitemap successfully created!

This should be fixed in #5 coming in the next release.

Please check again with v4.0.0

Looks good! Thanks!