%TAG prefix does not accept all characters in ns-uri-char production
gkellogg opened this issue · 2 comments
As noted in yaml/yaml-spec#268 (comment), Psych does not accept a %TAG
prefix including a #
, which seems to be due to the following code:
Lines 2603 to 2627 in f8f760f
According to theYAML 1.2 Spec the ns-uri-char
does include #
, which is missing from the scanner.
[39] ns-uri-char ::=
(
'%'
[ns-hex-digit](https://yaml.org/spec/1.2.2/#rule-ns-hex-digit){2}
)
| [ns-word-char](https://yaml.org/spec/1.2.2/#rule-ns-word-char)
| '#'
| ';'
| '/'
| '?'
| ':'
| '@'
| '&'
| '='
| '+'
| '$'
| ','
| '_'
| '.'
| '!'
| '~'
| '*'
| "'"
| '('
| ')'
| '['
| ']'
This prevents creating a TAG line such as the following:
%TAG ! http://www.w3.org/2001/XMLSchema#
As a workaround, %TAG ! http://www.w3.org/2001/XMLSchema%23
works, but is not ideal, and shouldn't be required based on the grammar.
The scanning issue extends to inline-tags, as well. If you parse the following
%TAG !xsd! http://www.w3.org/2001/XMLSchema%23
---
date: !xsd!date 2022-08-08
and re-serialize without the %TAG directive, you'll get the following:
date: !<http://www.w3.org/2001/XMLSchema%23date> 2022-08-08
Per the grammar, you should also be able to parse the following:
date: !<http://www.w3.org/2001/XMLSchema#date> 2022-08-08
But, it fails in a similar manner to that reported on %TAG
. In this case, it is the c-verbatim-tag
which includes ns-uri-char+
where the #
is again excluded.
Working around this requires a pre-parsing step to replace these characters are appropriate before parsing and after serializing.
This is tested using Ruby Psych version 4.0.4, which wraps libyaml, and the issues seem to be entirely within the library.