Error: Invalid character in entity name
Closed this issue · 2 comments
awhitford commented
Code like this:
const webDataConnector = createDataConnector({
provider: "web-scraper",
})
webDataConnector.setOptions({
urls: ["http://localhost:3000"],
mode: "sitemap",
})
const documents = await webDataConnector.getDocuments();
is creating the error:
Scraping data from http://localhost:3000
(node:53598) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Error processing http://localhost:3000: Error: Invalid character in entity name
Line: 5
Column: 2516
Char: &
Found 0 urls in sitemap
[]
But the http://localhost:3000/sitemap.xml
is not empty, it does have urls, and it does not have an ampersand (&
) character.
I am unclear what line 5 and column 2516 is referring to; I don't see an ampersand.
awhitford commented
Changing the url
value from http://localhost:3000
to http://localhost:3000/sitemap.xml
solved the problem. (I thought that the url
parameter would get the sitemap.xml
appended when mode
is sitemap
.)
nickscamara commented
That's a good idea though @awhitford, will add a fallback that tries to capture the sitemap even if it is not provided, when the sitemap mode is on.