postgres - simplified chinese collation on BTP Hyperscaler Option
Opened this issue · 2 comments
Description of erroneous behaviour
Hello,
when using "Simplified Chinese" in SAP Build Work Zone, requests use the "zh_CN" locale. On queries with orderBy, the method orderByICU in PostgresService.js translates "zh_CN" to "zh-CN-x-icu". Those queries fail on the PostgreSQL Hyperscaler Option (AWS, PostgreSQL version 14.9) with:
collation "zh-CN-x-icu" for encoding "UTF8" does not exist
Turns out that the following locales beginning with "zh" are available:
collname | collcollate | collctype | collversion |
---|---|---|---|
zh-Hans-CN-x-icu | zh-Hans-CN | zh-Hans-CN | 153.80.32.1 |
zh-Hans-HK-x-icu | zh-Hans-HK | zh-Hans-HK | 153.80.32.1 |
zh-Hans-MO-x-icu | zh-Hans-MO | zh-Hans-MO | 153.80.32.1 |
zh-Hans-SG-x-icu | zh-Hans-SG | zh-Hans-SG | 153.80.32.1 |
zh-Hans-x-icu | zh-Hans | zh-Hans | 153.80.32.1 |
zh-Hant-HK-x-icu | zh-Hant-HK | zh-Hant-HK | 153.80.32.1 |
zh-Hant-MO-x-icu | zh-Hant-MO | zh-Hant-MO | 153.80.32.1 |
zh-Hant-TW-x-icu | zh-Hant-TW | zh-Hant-TW | 153.80.32.1 |
zh-Hant-x-icu | zh-Hant | zh-Hant | 153.80.32.1 |
zh-x-icu | zh | zh | 153.80.32.1 |
zh_CN | zh_CN | zh_CN | 2.26-59.amzn2 |
zh_CN | zh_CN.utf8 | zh_CN.utf8 | 2.26-59.amzn2 |
zh_CN.gb2312 | zh_CN.gb2312 | zh_CN.gb2312 | 2.26-59.amzn2 |
zh_CN.utf8 | zh_CN.utf8 | zh_CN.utf8 | 2.26-59.amzn2 |
zh_HK | zh_HK.utf8 | zh_HK.utf8 | 2.26-59.amzn2 |
zh_HK.utf8 | zh_HK.utf8 | zh_HK.utf8 | 2.26-59.amzn2 |
zh_SG | zh_SG.utf8 | zh_SG.utf8 | 2.26-59.amzn2 |
zh_SG | zh_SG | zh_SG | 2.26-59.amzn2 |
zh_SG.gb2312 | zh_SG.gb2312 | zh_SG.gb2312 | 2.26-59.amzn2 |
zh_SG.utf8 | zh_SG.utf8 | zh_SG.utf8 | 2.26-59.amzn2 |
zh_TW | zh_TW.utf8 | zh_TW.utf8 | 2.26-59.amzn2 |
zh_TW | zh_TW.euctw | zh_TW.euctw | 2.26-59.amzn2 |
zh_TW.euctw | zh_TW.euctw | zh_TW.euctw | 2.26-59.amzn2 |
zh_TW.utf8 | zh_TW.utf8 | zh_TW.utf8 | 2.26-59.amzn2 |
where "zh-CN-x-icu" is missing but "zh-Hans-CN-x-icu" would be available - "Hans" for "Han (simplified variant)" Source
We managed to get it to work in our project by creating the missing collation via
CREATE COLLATION "zh-CN-x-icu" from "zh-Hans-CN-x-icu";
but perhaps this could be integrated natively into @cap-js/postgres?
Detailed steps to reproduce
execute query containing "orderBy" on locale "zh_CN" on PostgreSQL Hyperscaler Option
Details about your project
customer-project | |
---|---|
OData version | v4 |
Node.js version | v18.19.1 |
@sap/cds | 7.9.3 |
@sap/cds-compiler | 4.9.2 |
@sap/cds-dk | 7.9.4 |
@cap-js/postgres | 1.8.0 |
@cap-js/sqlite | 1.7.0 |
@martin-kl Thank you for reporting this productive behavior.
As we don't have access to all deployments of Postgres out there. We had to follow and trust the documentation of Postgres here. Which states that the icu
localization auto generates C
equivalent collations with the suffix -x-icu
. It seems that zh_CN
is an exception as the icu
definition has colliding definitions with the traditional
and simplified
versions being of equal weight.
Currently @cap-js/postgres
does an initial check whether the database instance has icu
collations defined (code). If that is the case it will use icu
as collation, but when the collation en-x-icu
does not exist it will use the C
collations. It will also check that when a locale comes in that the database knows the C
collation. If not it fallbacks to using no collation in queries in that unknown locale.
@martin-kl would it have an impact whether zh_CN
or zh-Hans-CN-x-icu
is used ? Or would it be sufficient if we provide an option to pick the C
collation over the icu
collations ?
@BobdenOs thanks for the quick reply.
As far as I can see it doesn't make a difference between zh_CN
and zh-Hans-CN-x-icu
in our application - so both would be fine for us.
Generally speaking as CAP developer I'd like it to work automatically, meaning that I don't need an option if it works with either the C
or the icu
collation out-of-the-box.