samdark/sitemap

Special Chars in URL

nadar opened this issue · 5 comments

nadar commented

I am not sure but this throws an exception because of special chars in url. But it seems that special chars are very common now (i just asked my self when it was the time this switched...)

The location must be a valid URL. You have specified: https://example.com/künstliche-intelligenz

File: samdark/sitemap/Sitemap.php
Line: 243

(The original domain was: https://heartbeat.gmbh, which is a valid domain)

nadar commented

I just tested, the problem is öäü: https://3v4l.org/anvhr

According to specification, URLs should be encoded: https://www.sitemaps.org/protocol.html#escaping

We can either add URL encoding or improve error message.

nadar commented

Maybe encoding the url would make sense, even urls like http://test.com/jp/新 would fail i assume.

Yes. Any non-ASCII URL would not pass the check and having it not-encoded in a sitemap is against the spec.