Regex for removal of duplicate slashes not preceded by a protocol is too strict
Closed this issue · 0 comments
gcox commented
// Remove duplicate slashes if not preceded by a protocol
if (urlObj.pathname) {
urlObj.pathname = urlObj.pathname.replace(/(?<!https?:)\/{2,}/g, '/');
}
Limiting the preceding protocol to only http or https is too strict. There are valid URLs that contain other protocols (ftp, s3, git, etc) as part of their path that are rendered invalid by this regex.
Example URLs broken by this regex:
http://sindresorhus.com/s3://sindresorhus.com
becomeshttp://sindresorhus.com/s3:/sindresorhus.com
http://sindresorhus.com/git://sindresorhus.com
becomeshttp://sindresorhus.com/git:/sindresorhus.com
Real world URL broken by this regex:
- https://images.megaphone.fm/8wdeoO3TExq3DnwqQWKXcWfnnGOJkA9kzcaZKo3Ue0M/plain/s3://megaphone-prod/podcasts/daef779e-0f93-11e9-8be1-ef6a540ab876/image/uploads_2F1561781562946-7xoryiijhz6-d6e84acd3033beaa78167923137f8d6c_2FEar_Biscuits_Profile_2019.jpg becomes https://images.megaphone.fm/8wdeoO3TExq3DnwqQWKXcWfnnGOJkA9kzcaZKo3Ue0M/plain/s3:/megaphone-prod/podcasts/daef779e-0f93-11e9-8be1-ef6a540ab876/image/uploads_2F1561781562946-7xoryiijhz6-d6e84acd3033beaa78167923137f8d6c_2FEar_Biscuits_Profile_2019.jpg