duckdb/duckdb-postgres

postgres_attach fails with SQL syntax error when postgres contains table with column of type "array of enum"

Closed this issue · 7 comments

What happens?

When you have table in postgres, containing a column whose type is "array of enum",
running postgres_attach fails with error like this:

Error: IO Error: Unable to query Postgres: ERROR:  syntax error at or near "."
LINE 1: SELECT unnest(enum_range(NULL::.myenum))
                                       ^

To Reproduce

  1. Start latest version (15.2) of postgresql locally in docker (but it's also reproducible with older postgres, like 12.11)
docker run \
-p 5432:5432 \
-e POSTGRES_USER=test \
-e POSTGRES_PASSWORD=test \
-e POSTGRES_DB=test \
-e PGDATA=/var/lib/postgresql/data/pgdata \
--tmpfs /var/lib/postgresql/data/pgdata \
postgres:15.2-alpine \
-c log_statement=all
  1. Connect to test db using psql
psql "host=localhost user=test password=test"

and init simple DB schema using following statements (note the column whose type is array of some enum)

CREATE TYPE myenum as ENUM ('whatever');
CREATE TABLE test (array_of_myenum myenum[]);
  1. Start duckdb from CLI and try postgres_attach
~/Tmp> duckdb 
v0.7.0 f7827396d7
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D LOAD postgres;
D CALL postgres_attach('host=localhost user=test password=test');
Error: IO Error: Unable to query Postgres: ERROR:  syntax error at or near "."
LINE 1: SELECT unnest(enum_range(NULL::.myenum))
                                       ^
 ERROR:  syntax error at or near "."
LINE 1: SELECT unnest(enum_range(NULL::.myenum))
                                       ^

OS:

Linux - Fedora 36

PostgreSQL Version:

reproduced in 12.11 and 15.2

DuckDB Version:

v0.7.0 f7827396d7

DuckDB Client:

duckdb cli

Full Name:

Jan Hrček

Affiliation:

Holmusk

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

I'm also running into this -- quite a lot of our arrays are enum types ha.

@jhrcek did you ever figure out a work around?

No I didn't yet figure it out.
For now I'm just "importing" individual tables that don't have enum-array columns, using postgres_scan https://duckdb.org/docs/guides/import/query_postgres.html

Gotcha. Thanks for opening this @jhrcek. This is somewhat if a blocker for a project so I'll try to open a PR 😄 .

I think I've found the issue. Array types go down this code path:

	// TODO better check, does the typtyp say something here?
	// postgres array types start with an _
	if (StringUtil::StartsWith(pgtypename, "_")) {
		return LogicalType::LIST(DuckDBType2(ele_info, atttypmod, nullptr, conn, context));
	}

The ele_info value is being passed as DuckDBType2's type_info argument. A few lines down, when the new instance of DuckDBType2 reaches enum unnesting it fails because the type_info->nspname is empty:

	if (type_info->typtype == "e") { // ENUM
		auto res = PGQuery(
		    conn, StringUtil::Format("SELECT unnest(enum_range(NULL::%s.%s))", type_info->nspname, type_info->typname));
		Vector duckdb_levels(LogicalType::VARCHAR, res->Count());
		for (idx_t row = 0; row < res->Count(); row++) {
			duckdb_levels.SetValue(row, res->GetString(row, 0));
		}

The issue is that the ele_info object is not getting nspname set initially here:

		info.elem_info.typname = res->GetString(row, 7);
		info.elem_info.typlen = res->GetInt64(row, 8);
		info.elem_info.typtype = res->GetString(row, 9);

Ultimately, this should be the fix but I will confirm in Discord:

                info.elem_info.nspname = res->GetString(row, 2);
		info.elem_info.typname = res->GetString(row, 7);
		info.elem_info.typlen = res->GetInt64(row, 8);
		info.elem_info.typtype = res->GetString(row, 9);

Gotcha. Thanks for opening this @jhrcek. This is somewhat if a blocker for a project so I'll try to open a PR 😄 .

I think I've found the issue. Array types go down this code path:

	// TODO better check, does the typtyp say something here?
	// postgres array types start with an _
	if (StringUtil::StartsWith(pgtypename, "_")) {
		return LogicalType::LIST(DuckDBType2(ele_info, atttypmod, nullptr, conn, context));
	}

The ele_info value is being passed as DuckDBType2's type_info argument. A few lines down, when the new instance of DuckDBType2 reaches enum unnesting it fails because the type_info->nspname is empty:

	if (type_info->typtype == "e") { // ENUM
		auto res = PGQuery(
		    conn, StringUtil::Format("SELECT unnest(enum_range(NULL::%s.%s))", type_info->nspname, type_info->typname));
		Vector duckdb_levels(LogicalType::VARCHAR, res->Count());
		for (idx_t row = 0; row < res->Count(); row++) {
			duckdb_levels.SetValue(row, res->GetString(row, 0));
		}

The issue is that the ele_info object is not getting nspname set initially here:

		info.elem_info.typname = res->GetString(row, 7);
		info.elem_info.typlen = res->GetInt64(row, 8);
		info.elem_info.typtype = res->GetString(row, 9);

Ultimately, this should be the fix but I will confirm in Discord:

                info.elem_info.nspname = res->GetString(row, 2);
		info.elem_info.typname = res->GetString(row, 7);
		info.elem_info.typlen = res->GetInt64(row, 8);
		info.elem_info.typtype = res->GetString(row, 9);

That sounds like it could fix the issue indeed - happy to review a PR.

Appreciate the quick response @Mytherin. I'll follow up with a PR shortly. Cheers!

@Mytherin @jhrcek Here's the PR: #86. Thanks

This should be fixed now