XML Special characters aren't being escaped properly
AjaniBilby opened this issue · 5 comments
Describe the bug
The library is not correctly sanitising inputs before putting them in an XML body
To Reproduce
import type { SitemapFunction } from 'remix-sitemap';
import { json } from "@remix-run/node";
export const sitemap: SitemapFunction = async () => {
return [{
lastmod: new Date().toISOString().split('T')[0],
loc: `/news/0`,
news: [{
title: "Something with an & symbol",
publication: {
name: 'Website',
language: 'en'
},
date: new Date().toISOString().split('T')[0]
}]
}]
};
export async function loader() { return json({}); }
Invalid XML output
<url>
<loc>https://example.com/news/0</loc>
<lastmod>2023-10-15</lastmod>
<changefreq>daily</changefreq>
<priority>0.7</priority>
<news:news>
<news:publication>
<news:name>Signplanet</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2023-10-15</news:publication_date>
<news:title>Something with an & symbol</news:title>
</news:news>
</url>
Expected behavior
<url>
<loc>https://example.com/news/0</loc>
<lastmod>2023-10-15</lastmod>
<changefreq>daily</changefreq>
<priority>0.7</priority>
<news:news>
<news:publication>
<news:name>Signplanet</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2023-10-15</news:publication_date>
<news:title>Something with an & symbol</news:title>
</news:news>
</url>
Environment (please complete the following information):
- remix-sitemap version
3.0.0
This could be a potetial patch fix, though it's not perfect:
"Some text & stuff <things> here".replaceAll(/[^A-z0-9 ]/g, (c)=>`&#${c.charCodeAt(0).toString()};`)
'Some text & stuff <things> here'
I guess this can be resolved by simply changing this to true, but performance may be worse
remix-sitemap/src/builders/sitemap.ts
Line 115 in 7e529e3
It would indeed slow down performance, but I doubt that it will impact the entire tool meaningfully for it's purpose based on the request per seconds claim.
At least for me the biggest issue is that this library for some reason creates a new database connection for every single route it's generating a sitemap for, which is a much larger bottle neck than any xml processing would be
Ironically it might be faster to use the regex method I provided earlier, because then you can apply it only the fields which might have special characters such as names and titles, and can ignore dates and urls
Yes, but your other comment makes sense, it's not expected to have a bunch of requests in a sitemap. The processEntities
can be true
At least for me the biggest issue is that this library for some reason creates a new database connection for every single route it's generating a sitemap for, which is a much larger bottle neck than any xml processing would be
For this, if you are creating the connection to db in each sitemap function, i assume this is normal.
In a node environment, this should be resolved by connecting outside of the function, sharing the instance. i guess.
In serverless/edge idk maybe solving #6 can help some. I need to test how it works.
Please create another issue to investigate more about this