pramsey/pgsql-ogr-fdw

Encode strings to destination encoding

Closed this issue · 8 comments

With Issue 101 resolved, need to encode strings to database encoding. This might be slow calling GetDatabaseEncoding() every time. Still have issues with extended characters.
dest_encoding.txt

This will not work with wchar_t strings. See #101

robe2 commented

Okay I think I got the gdal libiconv sorted out. I still don't know why it didn't work but building a new libiconv (1.16) and new curl (1.73.0) fixed both my missing libiconv and curl dependency. and voila winnie is happy again.

https://winnie.postgis.net:444/job/OGRFDW_PGVersion/606/console


============== running regression test queries        ==============

test file                         ... ok          609 ms

test import                       ... ok         3806 ms

=====================
 All 2 tests passed. 
=====================

After a lot of digging, I found iconv is extremely helpful but not required. The function I need is CPLRecode() to convert from LATIN1 to UTF8 This is possible without iconv but only has 3 encodings in the GDAL function CPLRecodeStub(). If you compile with libiconv, every encoding is possible that is supported by iconv --list.
If I can get the source encoding specified in the create server command via
OPTIONS (datasource 'foo@bar', format 'ODBC', open_options 'ENCODING=LATIN1')
,then I need to get the database encoding in GDAL format and conversion is done via:
char *CPLRecode( const char *pszSource,
const char *pszSrcEncoding, const char *pszDstEncoding );
I have it hardcoded now and it is working perfectly.

Much better solution. When you create the FDW server this will read the options section and look for an ENCODING parameter, i.e.:
CREATE SERVER xxx
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (datasource 'ODBC:foo@bar', format 'ODBC', open_options 'ENCODING=LATIN1');

Then it will encode the string from the source to the destination encoding. Tested on PG12/GDAL 2.4
To do:
Right now the destination is hard coded to UTF8.
No error checking for a valid encoding parameter.
CPLGetConfigOption("ENCODING", "") is called all the time should be stored in a global to increase speed.
encoding_options.txt

Huge memory leak in the above patch. Looking into it...

So, I'm not sure that open_options 'ENCODING=LATIN1' does what you think it does, and you're pushing open options into config options too... the open options get passed to the GDAL open routine, and then GDAL does what it needs to do. And if you go searching for that option in the GDAL code base, you'll find a handful of drivers that actually do something with it.

Basically you're just using the open option in GDAL to hold what should be information local the OGR FDW, which means really it should be passed as an FDW option and held in the FDW state, not passed into GDAL which has its own ideas about what to do with encodings.

Check out #197, @bowguy