Enhance ZnUrl to be able to parse according to rfc3986 & rfc8089
Closed this issue · 10 comments
In my understanding of things, the current ZnURL>>parseFrom: string defaultScheme: defaultScheme
do not appear to parse according the the 2005 rfc3986 standard for URI.
In particular, it is not part of the scheme if it uses // or not. Mostly this does not matter, but it matters for the file uri.
The detailed syntax of the file uri is in the 2017 rfc8089, and in particular it specifies that the host part can be left out (that is file:path
is allowed - path must start with a /
). Leaving out the host part (the part starting with //
and up til but not including the next /
means the local file system.
Thus: file:/seg1/seg2/seg3
, file://localhost/seg1/seg2/seg3
, and file:///seg1/seg2/seg3
are all the same. The first with no host specified means localhost. The second explicitly mentions localhost. The third leaves the host empty (nothing between //
and the following/
) which then defaults to localhost.
What ZnUrl
should be able to do is to accept all three, or in particular include the single /
format.
On an aside, for Pharo it might be useful to support a hostname called workingdirectory
(or wdir
for those who do not like typing).
@svenvc
At the moment (for many years) ZnUrl parsed URL schemes that either use the initial // (like http(s) and most others) or not (like mailto:) but not both.
file:// has been treated as always requiring a double slash.
What you are suggesting (and which seems to be supported by the RFC you mention) is that though the following works:
(FileLocator home / 'somefile.txt') resolve asUrl.
"file:///Users/sven/somefile.txt"
'file:///Users/sven/somefile.txt' asUrl asFileReference.
"File @ /Users/sven/somefile.txt"
the following should be parseable as being equivalent:
'file:/Users/sven/somefile.txt' asUrl.
As you can see in #schemesNotUsingDoubleSlash and #isSchemeNotUsingDoubleSlash: there is some support for this, but it is not enough to support what you want. This would require patching the already long #parseFrom:defaultScheme:
If this can be done in an (elegant)(acceptable) way that does not break anything, that would seem like a useful extension. Obviously, we'll need tests as well.
If you want, give it a try. I will have a look myself as well, but I can't promise anything.
Sven
I needed the functionality in something else I was working on. That corner does not need the full generality of uri parsing, so I will make it work in my corner and then return to this when I have some experience in dealing with it. I just wanted to leave it here as an enhancement rather than a bug.
I believe the best way forward is to grit ones teeth and parse according to the grammar in the rfc (there is a syntax diagram in wikipedia which is slightly more humane)
It turned out to be not that difficult:
Although I now allow the simpler form when parsing, it prints using the more common/standard form.
Wow - that was fast - wonderful.
This will be wonderful to get into pharo.
Now miracles happens - do you have an implementation of the resolution algorithm for relative urls as used in HTML as part of zink? I am thinking of section 5 of rfc3986, of which in particular 5.1.3 would look something like this in ZnUrl code:
(ZnUrl fromString: '/some/relative/path`)
resolvedByRetrievalURL: (ZnUrl fromString: 'file:/Users/kasper/tmp/').
(ZnUrl fromString: '/some/relative/path`)
resolvedByRetrievalURL:
(ZnUrl fromString: 'https://raw.githubusercontent.com/kasperosterbye/Microdown/dev/doc/')
Yes that is already implemented in ZnUrl>>#withRelativeReference:
The test suite is in ZnUrlTest>>#testReferenceResolution
I always load Zinc HTTP Components from its original source (obviously).
It would indeed be good to sync again with Pharo 10, the problem is that there are many non-functional style and naming changes in Pharo that make the merge hard.
I do not think you would lose functionality by reloading Zinc HTTP Components, on the contrary, you only gain features.
Hi sven
could you use an orderedDictionary instead of a dictionary to store the arguments?
Because it is really useful to be able iterate the keys in the same order.
S.
I assume you mean the query arguments, these are kept in a ZnMultiValueDictionary which is a subclass of OrderedDictionary.
At least in the latest upstream version.
Yes this is good to know and may be the version in Pharo is not up to date or I did not look at inheritance.
I just checked, and the update of ZnUrl has not made it into Pharo 10. I am not sure what the procedure for that is, and we are getting closer to the deadline for 10. But then we should aim it for 11 I think. It is a nice update.
I tagged the issue for pharo10