Crawler prunes anchor content from "content" results
Closed this issue · 1 comments
jdeathe commented
- codelibs/fess@8e987df#diff-eff61ee4803e2a1fad464fea66f4a810R122
- codelibs/fess@8e987df#diff-6a2c7a9b0d9b82a012d5c03fb1ca8ca9R134
Since the addition of a[rel="nofollow"]
to the setting crawler.document.html.pruned.tags
in the file fess_config.properties
the content being indexed has anchor content pruned.
Content example:
<p>Contact us using our <a href="https://www.domain.com/contact">contact form</a> or if you're visiting, you can <a href="//www.domain.com/location">get directions</a> to our head office.</p>
Is getting indexed as:
Contact us using our or if you're visiting, you can get directions to our head office.
jdeathe commented
Using the setting from the test case works:
crawler.document.html.pruned.tags=noscript,script,style,header,footer,nav,a[rel=nofollow]