PostgreSQL supports different encodings,
including SQL_ASCII
, UTF8
, WIN1252
etc.
However, SQL_ASCII
, a common encoding found in earlier databases, works
differently than the others. It does not enforce any particular encoding,
meaning it allows to store characters from any encoding. Thus it depends on the user
to enforce only ascii characters from the client side.
Because SQL_ASCII
does not enforce any encoding, converting a database from
SQL_ASCII
to the more ideal encoding UTF8
can become very challenging.
Because mixed encoding characters can get into a database or even within a single table,
loading the data into the target database may get errors like,
ERROR: invalid byte sequence for encoding "UTF8 ..."
This program helps to convert a PostgreSQL SQL_ASCII
encoded database
into UTF-8
. It takes a list of tables as a configuration and gets a CSV
dump of those tables. It then converts the CSV into utf-8 with the
configured Python error handlers.
It can then help create the target table and load the csv in the destination database.
Add the ENV PGPASSWORD
containing the PostgreSQL database user password,
and configure the rest of the settings in config.yaml
file.
Then simply run:
$ python main.py
The converted CSV files can be found inside the csv
folder.