roundcube/roundcubemail

vCard: Use sabre/vobject library

respiranto opened this issue · 10 comments

The vCard code currently has its own custom vCard parser. There are many issues with the current code and I deemed it better to use an external library, sabre/vobject being the apparently best choice.

See in particular the bug report #9593.

This may also help to address #9590.

Question 1: How should / can I report errors on parsing?

  • (a) When loading an individual vCard (constructor / load(); should generally be from the database),
  • (b) When import()ing (generally from a user provided vCard file).

For (b), I'd like to cause a user-visible message like "Failed importing $person_name", "Failed importing record without name or e-mail address ($record_data)", "Failed importing vCard due to unsupported encoding".

Question 2 (backslash handling): Would it be OK to break #4189 (and Roundcube\Tests\Framework\VCardTest::test_parse_five)?

  • In v3.0 and v4.0, backslashes that are not explicitly permitted as part of an escape sequence are generally illegal.
  • In v2.1, backslashes should apparently generally be treated like any other character.
  • The current code drops stray backslashes (e.g., \: becomes :).
    • This seems to be invalid for v2.1. (For v3.0 and v4.0, the input is illegal.)
  • sabre/vobject seems to parse stray (invalid) backslashes as themselves.
    • Therefore, I cannot distinguish them from propely (v3.0/v4.0)-escaped backslashes (\\), except by working on the raw input (which I very much would like to avoid).
    • Sidenote: sabre/vobject seems to incorrectly parse v3.0/v4.0 escape sequences for v2.1.

Question 3 (charset support): Would it be OK to reduce the set of supported charsets to that supported by sabre/vobject?

  • This would notably remove support for UTF-16, breaking Roundcube\Tests\Framework\VCardTest::test_import.
  • sabre/vobject supports UTF-8, ISO-8859-1, Windows-1252.

Suggestion 4 (charset support): Restrict document charset to (and assume) UTF-8.

  • The CHARSET parameter would still be respected (for v2.1; v3.0 and v4.0 do not define it).
  • Reason:
    • We can drop the detect_charset() heuristic, which may give wrong results.
  • Justification:
    • v4.0 mandates UTF-8. - RFC 6350#3.1
    • v3.0:
      • "Character set can only be specified on the CHARSET parameter on the Content-Type MIME header field." - RFC 2426#5
      • "UTF-8 preferred" - RFC 2426#4
      • It seems v3.0 is supposed to have a mime header, which does not seem to be the case here.
    • v2.1:
      • "The default character set is ASCII." - vCard 2.1 (section 2)
        • I see no harm in extending this to UTF-8.
      • "Some transports (e.g., MIME based electronic mail) may also provide a character set property at the transport wrapper level." - vCard 2.1 (section 2)
      • Again, we do not have a MIME header or similar AFAIK.
  • See also Question 3.
  1. I would postpone error handling to later, after the whole migration is done. I guess we can do something like rcube_addressbook::set_error() and friends.
  2. It would be good to verify if GMail is still doing that. It would be good to be able to import from GMail without issues.
  3. I think we could do UTF-8 only, but we probably should investigate what GMail and others use, so we can import data from them without much hassle.
  4. Where is question 4?
  1. OK
  2. I'll check that. Otherwise, if we assume UTF-8, removing illegal backslashes should actually not be too difficult.
  3. Any particular other software or providers that should be tested? Next to GMail, I'd definitely test Thunderbird.
  4. Reference should have been to Question 3. Corrected.

Testing:

  • GMail: v3.0, UTF-8, replaces : by \:.
  • Thunderbird: v4.0, UTF-8, : unchanged, \: correctly encoded as \\:.

GMail example:

BEGIN:VCARD
VERSION:3.0
FN:Jane Doé
N:Doé;Jane;;;
item1.URL:https\://example.org/
item1.X-ABLabel:
NOTE:noté\\\:foo\:bar
CATEGORIES:myContacts
END:VCARD

Thunderbird example:

BEGIN:VCARD
VERSION:4.0
N:Doé;Jane;;;
FN:Jane Doé
NOTE:noté:foo\\:bar
UID:fdc25d89-21a0-4069-a93f-2d1a13a6260c
URL:http://exampe.org
END:VCARD

Note that I have found 15 issues (mostly bugs) for sabre/vobject so far, all of which we should be able to work with, some of which may require workarounds.

pabzm commented

@respiranto Wow, I'm impressed! Thank you very much already! (I'm currently mostly distracted by other topics, unfortunately, so for me it might take a while to react to questions.)