Should string decoding check the character set?
Closed this issue · 3 comments
Hi,
to the best of my knowledge string character sets are not verified currently, for example PrintableString
or IA5String
. Is this correct?
To give some background for my question, suppose I parse an utf8 string and then do std::str::from_utf
, which I then expect to succeed.
In the X.509 Style Guide from 2000 some problems with the different string types are discussed. IA5String
, PrintableString
, VisibleString
seem relatively unproblematic, as they are more or less subsets of ASCII.
Teletex
/T61String
are very odd. BMPString
and UniversalString
seem two and four byte unicode encodings.
The document also contains a section "Comparing DNs" which is a non-trivial problem because of these string types.
I am not sure what of this is still relevant today. I will try to look into this.
Nevertheless, I think some support for handling the string types would be nice.
I used the certificates from these two test sets: [1], [2].
The only string types used were PrintableString
, IA5String
and UTF8String
. I used openssl asn1parse
.
There is another test tool, which was easy to find: [3]. I have not looked into this one.
Obviously der-parser is not limited to x509. I would suggest to parse the "easy" string types (with checks of allowed characters) and provide api to access them as &str. For the legacy types only the raw &[u8]
access is possible.
Closing this, as most string types are now parsed and decoded, and can be obtained in as_str()
.
Please open new issues for specific string types if others are needed.
Additional note: for X.509, I had to add support for non-RFC-compliant strings (bad encoding)m for ex. in rusticata/x509-parser@b3beae8