Multi-domain errors cause sitemapindex XML confusion
NiklasBr opened this issue · 0 comments
PHP version(s) affected: 8.1.13
Package version(s) affected: 3.3.0
Description
With a Symfony 5.4-based application, multiple sites with separate domains share a /public
directory. For example:
- 1.example.com
- 2.example.com
- 3.example.com
For each of these sites we run the following command (manually or via cron)
bin/console presta:sitemaps:dump --section site_1 --base-url https://1.example.com/ var/tmp/sitemaps
bin/console presta:sitemaps:dump --section site_2 --base-url https://2.example.com/ var/tmp/sitemaps
bin/console presta:sitemaps:dump --section site_3 --base-url https://3.example.com/ var/tmp/sitemaps
Now, after the first command for --section site_1
has been completed the XML is updated as expected:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://1.example.com/sitemap.site_1.xml</loc>
<lastmod>2023-01-18T16:20:49+01:00</lastmod>
</sitemap>
</sitemapindex>
And then after the second command, for --section site_2
, has completed, all domains change in the index XML file, the content of the urlset https://2.example.com/sitemap.site_2.xml is correct, it has the correct base URL:s for all locations. But the index XML changes all URL:s.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://2.example.com/sitemap.site_2.xml</loc>
<lastmod>2023-01-18T16:23:22+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://2.example.com/sitemap.site_1.xml</loc>
<lastmod>2023-01-18T16:20:49+01:00</lastmod>
</sitemap>
</sitemapindex>
And then after the second command, for --section site_3
, has completed, all domains change in the index XML file, the content of the urlset https://3.example.com/sitemap.site_3.xml is correct, it has the correct base URL:s for all locations. But the index XML changes all URL:s.
<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://3.example.com/sitemap.site_3.xml</loc>
<lastmod>2023-01-18T16:27:28+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://3.example.com/sitemap.site_2.xml</loc>
<lastmod>2023-01-18T16:23:22+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://3.example.com/sitemap.site_1.xml</loc>
<lastmod>2023-01-18T16:20:49+01:00</lastmod>
</sitemap>
</sitemapindex>
Now, to where the error occurs, when starting over with the commands, e.g. the next day to periodically regenerate the files, the new one gets added on top of the previous ones:
<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://1.example.com/sitemap.site_1.xml</loc>
<lastmod>2023-01-18T16:33:48+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://1.example.com/sitemap.site_3.xml</loc>
<lastmod>2023-01-18T16:27:28+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://1.example.com/sitemap.site_2.xml</loc>
<lastmod>2023-01-18T16:23:22+01:00</lastmod>
</sitemap>
<sitemap>
<loc>https://1.example.com/sitemap.site_1.xml</loc>
<lastmod>2023-01-18T16:20:49+01:00</lastmod>
</sitemap>
</sitemapindex>
How to reproduce
I think the full description above should do it.
Possible Solution
Maybe tag each <sitemap>
in the index XML with the specific section, such as <sitemap id="site_1">
instead and use that to identify whether or not to update/add to the file?
Additional Context
n/a