vCard: Use sabre/vobject library
respiranto opened this issue · 10 comments
The vCard code currently has its own custom vCard parser. There are many issues with the current code and I deemed it better to use an external library, sabre/vobject being the apparently best choice.
See in particular the bug report #9593.
This may also help to address #9590.
I started doing things: https://github.com/respiranto/roundcubemail/commits/devel-vcard-sabre-vobject
Question 1: How should / can I report errors on parsing?
- (a) When loading an individual vCard (constructor /
load()
; should generally be from the database), - (b) When
import()
ing (generally from a user provided vCard file).
For (b), I'd like to cause a user-visible message like "Failed importing $person_name", "Failed importing record without name or e-mail address ($record_data)", "Failed importing vCard due to unsupported encoding".
Question 2 (backslash handling): Would it be OK to break #4189 (and Roundcube\Tests\Framework\VCardTest::test_parse_five
)?
- In v3.0 and v4.0, backslashes that are not explicitly permitted as part of an escape sequence are generally illegal.
- In v2.1, backslashes should apparently generally be treated like any other character.
- The current code drops stray backslashes (e.g.,
\:
becomes:
).- This seems to be invalid for v2.1. (For v3.0 and v4.0, the input is illegal.)
- sabre/vobject seems to parse stray (invalid) backslashes as themselves.
- Therefore, I cannot distinguish them from propely (v3.0/v4.0)-escaped backslashes (
\\
), except by working on the raw input (which I very much would like to avoid). - Sidenote: sabre/vobject seems to incorrectly parse v3.0/v4.0 escape sequences for v2.1.
- Therefore, I cannot distinguish them from propely (v3.0/v4.0)-escaped backslashes (
Question 3 (charset support): Would it be OK to reduce the set of supported charsets to that supported by sabre/vobject?
- This would notably remove support for UTF-16, breaking
Roundcube\Tests\Framework\VCardTest::test_import
. - sabre/vobject supports UTF-8, ISO-8859-1, Windows-1252.
Suggestion 4 (charset support): Restrict document charset to (and assume) UTF-8.
- The
CHARSET
parameter would still be respected (for v2.1; v3.0 and v4.0 do not define it). - Reason:
- We can drop the
detect_charset()
heuristic, which may give wrong results.
- We can drop the
- Justification:
- v4.0 mandates UTF-8. - RFC 6350#3.1
- v3.0:
- "Character set can only be specified on the CHARSET parameter on the Content-Type MIME header field." - RFC 2426#5
- "UTF-8 preferred" - RFC 2426#4
- It seems v3.0 is supposed to have a mime header, which does not seem to be the case here.
- v2.1:
- "The default character set is ASCII." - vCard 2.1 (section 2)
- I see no harm in extending this to UTF-8.
- "Some transports (e.g., MIME based electronic mail) may also provide a character set property at the transport wrapper level." - vCard 2.1 (section 2)
- Again, we do not have a MIME header or similar AFAIK.
- "The default character set is ASCII." - vCard 2.1 (section 2)
- See also Question 3.
- I would postpone error handling to later, after the whole migration is done. I guess we can do something like
rcube_addressbook::set_error()
and friends. - It would be good to verify if GMail is still doing that. It would be good to be able to import from GMail without issues.
- I think we could do UTF-8 only, but we probably should investigate what GMail and others use, so we can import data from them without much hassle.
- Where is question 4?
- OK
- I'll check that. Otherwise, if we assume UTF-8, removing illegal backslashes should actually not be too difficult.
- Any particular other software or providers that should be tested? Next to GMail, I'd definitely test Thunderbird.
- Reference should have been to Question 3. Corrected.
Testing:
- GMail: v3.0, UTF-8, replaces
:
by\:
. - Thunderbird: v4.0, UTF-8,
:
unchanged,\:
correctly encoded as\\:
.
GMail example:
BEGIN:VCARD
VERSION:3.0
FN:Jane Doé
N:Doé;Jane;;;
item1.URL:https\://example.org/
item1.X-ABLabel:
NOTE:noté\\\:foo\:bar
CATEGORIES:myContacts
END:VCARD
Thunderbird example:
BEGIN:VCARD
VERSION:4.0
N:Doé;Jane;;;
FN:Jane Doé
NOTE:noté:foo\\:bar
UID:fdc25d89-21a0-4069-a93f-2d1a13a6260c
URL:http://exampe.org
END:VCARD
Note that I have found 15 issues (mostly bugs) for sabre/vobject so far, all of which we should be able to work with, some of which may require workarounds.
@respiranto Wow, I'm impressed! Thank you very much already! (I'm currently mostly distracted by other topics, unfortunately, so for me it might take a while to react to questions.)