Should string decoding check the character set?

Question

Should string decoding check the character set?

Closed this issue 4 years ago · 3 comments

Hi,

to the best of my knowledge string character sets are not verified currently, for example PrintableString or IA5String. Is this correct?

To give some background for my question, suppose I parse an utf8 string and then do std::str::from_utf, which I then expect to succeed.

Answer 1 · 2019-09-20T08:17:03.000Z

In the X.509 Style Guide from 2000 some problems with the different string types are discussed. IA5String, PrintableString, VisibleString seem relatively unproblematic, as they are more or less subsets of ASCII.

Teletex/T61String are very odd. BMPString and UniversalString seem two and four byte unicode encodings.

The document also contains a section "Comparing DNs" which is a non-trivial problem because of these string types.

I am not sure what of this is still relevant today. I will try to look into this.

Nevertheless, I think some support for handling the string types would be nice.

Answer 2 · 2019-09-20T09:26:55.000Z

I used the certificates from these two test sets: [1], [2].

The only string types used were PrintableString, IA5String and UTF8String. I used openssl asn1parse.

There is another test tool, which was easy to find: [3]. I have not looked into this one.

Obviously der-parser is not limited to x509. I would suggest to parse the "easy" string types (with checks of allowed characters) and provide api to access them as &str. For the legacy types only the raw &[u8] access is possible.

Answer 3 · 2020-11-02T13:15:55.000Z

Closing this, as most string types are now parsed and decoded, and can be obtained in as_str().
Please open new issues for specific string types if others are needed.

Additional note: for X.509, I had to add support for non-RFC-compliant strings (bad encoding)m for ex. in rusticata/x509-parser@b3beae8