mangstadt/ez-vcard

vCards broken on systems with locale with non-Western digits

rfc2822 opened this issue · 5 comments

Hi,

Thanks for ez-vcard, which works really reliable. I have however found a problem when the default locale is set to something with non-Western (like Arabic, Farsi, …) digits. In this case, some properties which are internally stored as numbers are printed with localized numbers instead of ASCII digits. For instance, on a system with an Iran (Persian; native digits) locale, a birthday on 5 Sep 1998 is generated as ۱۹۹۸۰۹۰۵.

See here for the original discovery and more information.

I have written some tests:

import ezvcard.Ezvcard
import ezvcard.VCard
import ezvcard.VCardVersion
import ezvcard.property.Geo
import org.junit.Assert.assertEquals
import org.junit.Before
import org.junit.Test
import java.util.*

class TestLocaleNonWesternDigits {

    companion object {
        val locale = Locale("fa", "ir", "u-un-arabext")
    }

    @Before
    fun verifyLocale() {
        assertEquals("Persian (Iran) locale not available", "fa-IR", locale.toLanguageTag())
        Locale.setDefault(locale)
    }

    @Test
    fun testLocale_StringFormat() {
        assertEquals("۲۰۲۰", String.format("%d", 2020))    // does not fail if the Locale with Persian digits is available
    }

    @Test
    fun testLocale_StringFormat_Root() {
        assertEquals("2020", String.format(Locale.ROOT, "%d", 2020))
    }

    @Test
    fun testLocale_ezVCard() {
        val vCard = VCard(VCardVersion.V4_0)
        vCard.geo = Geo(1.0, 2.0)
        assertEquals("BEGIN:VCARD\r\n" +
                "VERSION:4.0\r\n" +
                "PRODID:ez-vcard 0.11.1\r\n" +
                "GEO:geo:1.0,2.0\r\n" +             // fails: is "GEO:geo:۱.۰,۲.۰\r\n" instead
                "END:VCARD\r\n", Ezvcard.write(vCard).go())
    }
    
}

Sorry, I have too late noticed that they're in Kotlin instead of Java, but it's just a few lines. I hope they demonstrate the problem nevertheless. For testing, it's important that the system / JVM supports native digits and the selected locale. For testing, you can use other non-Western locales, like ar_SA or th_TH. I was not able to select such a locale on my Linux system with OpenJDK 1.8.0_242-release, but it works with Android's JVM (tested with Android 10 and 11 emulator).

I think the numbers are converted to strings in a locale-sensitive manner like String.printf("%d", number) or SimpleDateFormatter. My suggestion is to use String.printf(Locale.US, "%d", number) (or Locale.ROOT?) wherever vCard content is generated because this is what's intended: the numbers shall be printed in Western digits.

Maybe this is applicable to parsing, too. Haven't tested that yet (parse properties with correct Western digits with ez-vcard on systems with non-Western digit locale).

(ez-vcard 0.11.1)

Thanks for the report and the unit test!

Fixed in dce98f3 and 4e9da4f (there were some unit test failures during CI).

Locale.ROOT seems to work fine. It is probably a better choice than Locale.US because it's more "neutral".

I wrote unit tests that test the new code over every locale on the local JVM. The fix works for all of them on my machine and the CI builds. Also, my testing shows that ez-vcard is able to parse numbers that are in the vCard decimal format (i.e. using western-style numbers) no matter the default locale.

Wow, this was fast! :) I think this would apply to VCardDateFormat, too. SimpleDateFormat also uses Locale-dependent digits.

You're right, it does! Thanks for thinking of that! Fixed in e6a1b2c.

I just found that PartialDate's toISO8601 uses NumberFormat, which is also localizable and thus causes the same problem for partial dates.

NumberFormat can be adjusted so that it uses the ROOT locale, too.

Thanks for pointing that out. I found several other places that are affected by this as well.

I must have assumed that these instances would not be affected by this issue because the numbers were not using decimal or grouping separators.

Fixed in b8b0b0c.