lgraubner/sitemap-generator-cli

Ignore “ignore”

elmimmo opened this issue · 4 comments

Is there a way to ignore whatever criteria is being used to ignore URLs and have those in the sitemap too?

Some URLs are ignored if their source code have a line similar to this one:

<meta name="robots" content="noindex , nofollow" />

(which IMHO sitemap-generator-cli should ignore when run with the option --no-respect-robots-txt).

Comment out line 102 in /usr/local/lib/node_modules/sitemap-generator-cli/index.js in order to include those URLs in the sitemap too, like so:

      // /(<meta(?=[^>]+noindex).*?>)/.test(page) || // check if robots noindex is present

The workaround no longer works - something else is at line 102 now and "noindex" does not occur anywhere in the file.