[ BUG ] Premature end of data in tag url line 1
miladmeidanshahi opened this issue · 2 comments
miladmeidanshahi commented
Hi,
I wrote the script for generating custom sitemap data but sometimes get the error: if I reduce the length of data to around 500 items it works perfectly but if the script goes around 3000 data this error rises. when I manually add the </url></urlset>
end tag to the stored file the problem is fixed but why?!
xml is invalid Error: Command failed: xmllint --schema /home/sitemap-generator/node_modules/.pnpm/sitemap@7.1.1/node_modules/sitemap/schema/all.xsd --noout -
-:1: parser error : Premature end of data in tag url line 1
AF%DB%8C%D8%AF%DB%8C%20%D9%86%D8%B4%D8%B1%20%D8%B4%D9%85%D8%B4%D8%A7%D8%AF</loc>
^
at ChildProcess.exithandler (node:child_process:419:12)
at ChildProcess.emit (node:events:513:28)
at maybeClose (node:internal/child_process:1091:16)
at ChildProcess._handle.onexit (node:internal/child_process:302:5) {
code: 1,
killed: false,
signal: null,
cmd: 'xmllint --schema /home/milad/Public/Projects/sitemap-generator/node_modules/.pnpm/sitemap@7.1.1/node_modules/sitemap/schema/all.xsd --noout -'
} -:1: parser error : Premature end of data in tag url line 1
AF%DB%8C%D8%AF%DB%8C%20%D9%86%D8%B4%D8%B1%20%D8%B4%D9%85%D8%B4%D8%A7%D8%AF</loc>
my script:
#!/usr/bin/env node
import { createWriteStream, createReadStream } from 'fs'
import yargs from 'yargs'
import { hideBin } from 'yargs/helpers'
import { createGzip } from 'zlib'
import { xmlLint, parseSitemap, SitemapStream } from 'sitemap'
class HTTPResponseError extends Error {
constructor(response) {
super(`HTTP Error Response: ${response.status} ${response.statusText}`)
this.response = response
}
}
const checkStatus = response => {
if (response.ok) {
// response.status >= 200 && response.status < 300
return response
} else {
throw new HTTPResponseError(response)
}
}
const argv = yargs(hideBin(process.argv)).argv
if (argv.url) {
if (!/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,=.]+$/.test(argv.url)) {
console.log('URL is not valid!')
process.exit(0)
}
try {
const URL = argv.url
const sitemap = new SitemapStream({
hostname: URL,
lastmodDateOnly: true,
xmlns: { // XML namespaces to turn on - all by default
news: true,
xhtml: true
}
})
sitemap.pipe(createGzip())
const writeStream = createWriteStream(argv.output ?? './sitemap.xml')
sitemap.pipe(writeStream)
const request = await fetch(`${URL}/api/sandbox/settings/sitemap`)
checkStatus(request)
const data = await request.json()
data.products.forEach(({ id, name }) => {
sitemap.write({
url: `${URL}/products/${id}/${name}`,
lastmod: new Date(),
changefreq: 'weekly',
priority: 0.9
})
})
data.categories.forEach(category => {
const typeOfCategory = () => {
if (category.is_tag) return 'tag'
if (category.is_brand) return 'brand'
if (!category.is_brand && !category.is_tag) return 'category'
}
sitemap.write({
url: `${URL}/collections?filter=${typeOfCategory()}&filter_title=${category.name}`,
changefreq: 'weekly',
priority: 0.9
})
})
sitemap.end()
console.log(URL)
console.log('Products', data.products.length)
console.log('Categories', data.categories.length)
console.log('Successfully generated.')
xmlLint(createReadStream(argv.output ?? './sitemap.xml')).then(
() => console.log('xml is valid'),
([err, stderr]) => console.error('xml is invalid', err, stderr)
)
} catch (error) {
console.error(error)
const errorBody = await error.response.text()
console.error(`Error body: ${errorBody}`)
}
} else {
console.log('URL is required! pass --url https://example.com')
}
huntharo commented
It's happening because you are not waiting for the stream to close, so any buffered contents are not being flushed to the file. Streams are async but most Node.js devs do not seem to realize that. Even write
technically is async and needs you to wait for a callback before throwing another 10k writes at it.
To fix the race condition you need to add this:
import { finished } from 'stream/promises';
// [...]
await finished(sitemap);
await finished(writeStream);