`col_type_is` cannot handle expected type `interval second(0)`
ewie opened this issue · 24 comments
Just upgraded to pgTAP 1.3.0 and now the following test fails on Postgres 12.16:
BEGIN;
CREATE TABLE t (i interval second(0));
SELECT plan(2);
-- Test #1
SELECT col_type_is('t', 'i', 'interval second(0)');
ALTER TABLE t ALTER i TYPE interval(0);
-- Test #2
SELECT col_type_is('t', 'i', 'interval second(0)');
SELECT finish();
ROLLBACK;
Test output:
# Failed test 1: "Column t.i should be type interval second(0)"
# have: interval second(0)
# want: interval(0)
# Looks like you failed 1 test of 2
Test #2 succeeds after altering the column type to just interval(0)
. But it feels wrong to add a database migration to change the column type just for fixing that test.
Same output also when changing the expected type to just interval(0)
.
What confuses me about the output is that it says want: interval(0)
although interval second(0)
is given as the expected type.
According to https://www.postgresql.org/docs/12/datatype-datetime.html, interval second(0)
and interval(0)
are equivalent (at least that's how I read it):
Note that if both fields and p are specified, the fields must include
SECOND
, since the precision applies only to the seconds.
Am I missing something with the latest changes to col_type_is
in 1e1d745?
We're experiencing the same issue. If it helps, here are some more type-identifying strings affected:
interval year
interval second(3)
timestamp(3) without time zone
It would seem that a common thread is these use some of the more exotic ways that typmod
can be converted into a type-identifying string.
The full tests affected are visible in the PR https://github.com/centerofci/mathesar/pull/3156/files
This happens because pgTAP relies on the canonical spelling of a type, which you can see using pg_typeof()
:
david=# select pg_typeof('3'::interval second(0));
pg_typeof
-----------
interval
That said, we could probably integrate the _typename()
function added in #294 in more places to transparently convert alternate names to their canonical names:
david=# select _typename('interval second(0)');
_typename
-----------
interval
Which would, I think, be a big help, but would be a pretty invasive change --- lots of places will need to call that function, every one of them creating a savepoint/subtransaction.
+1 from me on @ewie's approach to solving the problem. Comparing the literal types as an override is a small amount of code and will not result in false positives. I'd like to get this solved with rapidity as our suite is also broken, so if another approach to this issue is desired let me know and I can explore a fix.
It fixes col_type_is()
, yes, but what about all the other places where the type can be aliased? I see 9 instances of _quote_ident_like()
, for instance. And there are lots of places where it compares values to what's in the catalog — I see 24 instances of typname
, which is just a name
column. Those all need to be updated, too, because stuff like this doesn't work:
david=# SELECT TRUE FROM pg_type WHERE typname = 'interval second(0)';
?column?
----------
(0 rows)
But if we use regtype
it works:
SELECT TRUE FROM pg_type WHERE typname = 'interval second(0)'::regtype::name;
?column?
----------
t
(1 row)
Or just:
david=# SELECT TRUE FROM pg_type WHERE oid = 'interval second(0)'::regtype;
?column?
----------
t
(1 row)
The point is, we need to pretty comprehensively need to handle type aliases through pgTAP, not just in a few places.
Meanwhile, the docs have always said that types must be spelled as they are spelled by Postgres itself. So don't change the type of your column from interval second(0)
to interval(0)
. Just change the test to use interval(0)
. Or just interval
if that's what works:
david=# select pg_typeof('0'::interval(0));
pg_typeof
-----------
interval
I see. But then col_type_is()
is inconsistent regarding type modifiers. _quote_ident_like()
handles type modifiers to some degree, but only by extracting the "precision" from the provided type name:
Line 1468 in 1e1d745
But this information is nevertheless stored as the type modifier in pg_attribute.atttypmod
which can be restored to a human-readable form with format_type()
. As I understand it, type modifiers are only relevant to columns (I can't find any other reference than atttypmod
). So only col_type_is()
should have to handle type modifiers, instead of all functions that call _quote_ident_like()
.
I think we have different ideas on what col_type_is()
should accomplish. I see the point in just testing the bare type (without modifier) in order to just tell if a column supports certain operations which are only determined by the bare type and not the type modifier. But to me col_type_is()
has been about the column type and type modifier before pgTAP 1.3.0.
If you don't like changing col_type_is()
, which I understand, then how about a new function col_typemod_is()
or col_typeformat_is()
which verifies the full column type with format_type(atttypid, atttypmod) = want_type
? Right now pgTAP does not have a function to test whether a column is interval second
or interval year
.
According to https://www.postgresql.org/docs/12/datatype-datetime.html,
interval second(0)
andinterval(0)
are equivalent (at least that's how I read it):
Even if those types act the same (not obvious to me that this is true under all circumstances), those are different typmod
s:
mathesar=# CREATE TABLE example (col1 interval(0), col2 interval second(0));
CREATE TABLE
mathesar=# \d example
Table "public.example"
Column | Type | Collation | Nullable | Default
--------+--------------------+-----------+----------+---------
col1 | interval(0) | | |
col2 | interval second(0) | | |
mathesar=# SELECT attname, atttypid, atttypmod FROM pg_attribute WHERE attrelid='example'::regclass::oid AND attnum>0;
attname | atttypid | atttypmod
---------+----------+------------
col1 | 1186 | 2147418112
col2 | 1186 | 268435456
(2 rows)
I think it would be valuable to be able to distinguish between these when testing.
Right now pgTAP does not have a function to test whether a column is
interval second
orinterval year
.
This is a problem for us as well. The typmod
of a column is an important thing to be able to test.
I see. But then
col_type_is()
is inconsistent regarding type modifiers._quote_ident_like()
handles type modifiers to some degree, but only by extracting the "precision" from the provided type name:
As of 1.3.0, the precision doesn't work properly for timestamp columns at all. If I'm understanding the problem, there's no possible spelling of timestamp(3)
or timestamp(3) without time zone
that you can test against.
Right now my test uses format_type
directly:
SELECT
is(
format_type(atttypid, atttypmod),
'interval second(0)',
'Column t.i should be type interval second(0)'
)
FROM
pg_attribute
WHERE
attrelid = 't'::regclass
AND attname = 'i';
It's just a single test case being affected, so I don't need a pgTAP function for that right now. But I'll implement one when I find the time.
But I found one case (#2) in our code base where col_type_is
still works as before:
BEGIN;
SELECT plan(3);
SELECT is(pgtap_version(), 1.3, 'pgTAP 1.3');
CREATE SCHEMA postgis;
CREATE EXTENSION postgis SCHEMA postgis;
CREATE TABLE public.mytbl (
geom postgis.geometry(polygon, 4326)
);
SELECT col_type_is(
'public'::name, 'mytbl'::name, 'geom'::name, 'postgis.geometry(Polygon,4326)'
);
SELECT col_type_is(
'public'::name, 'mytbl'::name, 'geom'::name, 'postgis'::name, 'geometry(Polygon,4326)'
);
SELECT finish();
ROLLBACK;
1..3
ok 1 - pgTAP 1.3
ok 2 - Column public.mytbl.geom should be type postgis.geometry(Polygon,4326)
not ok 3 - Column public.mytbl.geom should be type postgis.geometry(Polygon,4326)
# Failed test 3: "Column public.mytbl.geom should be type postgis.geometry(Polygon,4326)"
# have: postgis.geometry(Polygon,4326)
# want: postgis.geometry(Polygon,4326)(Polygon,4326)
# Looks like you failed 1 test of 3
Failed 1/3 subtests
In this case the qualified type postgis.geometry(Polygon,4326)
works. Adding postgis
to the search path also makes test case #3 pass (regardless if the type is qualified or not).
But my original test with interval second(0)
does not magically work when qualifying the type with pg_catalog
:
SELECT col_type_is('t', 'i', 'pg_catalog.interval second(0)');
ERROR: syntax error at or near "second"
CONTEXT: invalid type name "pg_catalog.interval second(0)"
PL/pgSQL function _typename(name) line 2 at RETURN
PL/pgSQL function _quote_ident_like(text,text) line 5 during statement block local variable initialization
PL/pgSQL function col_type_is(name,name,name,text,text) line 20 at assignment
But it's not even possible to create a table with pg_catalog.interval second(0)
or pg_catalog.timestamp(3) without time zone
:
postgres=# CREATE TABLE t (i pg_catalog.interval second(0));
ERROR: syntax error at or near "second"
LINE 1: CREATE TABLE t (i pg_catalog.interval second(0));
Looks a general issue with type names consisting of multiple keywords regardless of whether the keywords are part of the type name or type modifier: timestamp without time zone
is the actual type name (alias of timestamp
), whereas interval second
is a type name plus modifier. But standard SQL types are resolved differently anyway: https://www.postgresql.org/message-id/1819966.1692278294%40sss.pgh.pa.us.
CREATE TABLE t (
f0 timestamp,
f1 timestamp without time zone,
f2 timestamptz,
f3 timestamp with time zone,
f4 interval,
f5 interval second,
f6 interval year
);
SELECT
attname,
atttypid,
atttypmod,
format_type(atttypid, atttypmod)
FROM
pg_attribute
WHERE
attrelid = 't'::regclass
AND attnum >= 1;
attname | atttypid | atttypmod | format_type
---------+----------+-----------+-----------------------------
f0 | 1114 | -1 | timestamp without time zone
f1 | 1114 | -1 | timestamp without time zone
f2 | 1184 | -1 | timestamp with time zone
f3 | 1184 | -1 | timestamp with time zone
f4 | 1186 | -1 | interval
f5 | 1186 | 268500991 | interval second
f6 | 1186 | 327679 | interval year
(7 rows)
Apologies for the delay getting back to this issue; I've been distracted by other projects.
I tracked down the tyemod for interval second(0)
from @mathemancer's example (268435456
), and I think I better understand the issue now:
david=# select format_type('interval second(0)'::regtype, -1);
format_type
-------------
interval
(1 row)
david=# select format_type('interval second(0)'::regtype, 268435456);
format_type
--------------------
interval second(0)
(1 row)
So with the typemod one gets the whole thing. Poking around the source, I found that format_type()
calls a specific typemod formatting function for some types:
david=# select typname, typmodout from pg_type where typmodout <> 0;
typname | typmodout
--------------+----------------------
bpchar | bpchartypmodout
varchar | varchartypmodout
time | timetypmodout
timestamp | timestamptypmodout
timestamptz | timestamptztypmodout
interval | intervaltypmodout
timetz | timetztypmodout
bit | bittypmodout
varbit | varbittypmodout
numeric | numerictypmodout
_bpchar | bpchartypmodout
_varchar | varchartypmodout
_time | timetypmodout
_timestamp | timestamptypmodout
_timestamptz | timestamptztypmodout
_interval | intervaltypmodout
_timetz | timetztypmodout
_bit | bittypmodout
_varbit | varbittypmodout
_numeric | numerictypmodout
(20 rows)
Sure enough, when I call intervaltypmodout()
with the interval second(0)
typemod, I get:
david=# select intervaltypmodout(268435456);
intervaltypmodout
-------------------
second(0)
(1 row)
I had no idea that words could be part of the typemod! I think this is the core of the issue. Proper to v1.2.0 (and prior to #294), _quote_ident_like
didn't change the value passed to it. That meant one always had to declare the full name of a type. With this change, it changes changes it to the canonical type name in order to allow aliases. But without the full typemod things don't quite work.
I agree with you, @ewie, that col_type_is
should specifically test for full types in table columns, and that this is the only place that typemods matter. Creating a function that uses one, for example, drops the typemod!
david=# create function foo(f interval second(0)) returns void language sql as '';
CREATE FUNCTION
david=# \df foo
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------+------------------+---------------------+------
public | foo | void | f interval | func
So I think it comes down to, as you said, at least fixing col_type_is()
, and not having to bother with everything else where typemods don't matter. I see three possibilities:
- Restore the pre-1.3.0 behavior of custom formatting data types, though only for
col_type_is()
. Would lose the ability to use aliases, however. - Fall back on comparing the hand-quoted types if the canonical type formatting fails, as suggested in #316. Probably the least painful and preserves aliases, but doesn't take advantage of Postgres-supplied type formatting functions.
- Figure out some way to derive the typemod from a type name. There are the
typmodin
procedures defined inpg_type
; maybe they could be used? Or, hey, maybe there's some function that's just the integers offormat_type()
? I'm willing to poke around a bit to find out, but it may not exist.
Option 2 is probably best at this point, simplest and would allow for a quick release. I'll poke a bit at 3 now, but look for a fix one way or the other in the next few days.
Hi @theory, thanks for looking into a solution.
- Figure out some way to derive the typemod from a type name. There are the
typmodin
procedures defined inpg_type
; maybe they could be used? Or, hey, maybe there's some function that's just the integers offormat_type()
? I'm willing to poke around a bit to find out, but it may not exist.
I checked Postgres' src/include/catalog/pg_proc.dat
but haven't found any undocumented function in addition to format_type()
.
Using the typmodin functions found in pg_type
does work, except for interval second (0)
(explanation at the end).
Here's a PoC. It's basically what _quote_ident_like()
already does with pcision
. The value just needs to be formatted as a cstring[]
literal, so replacing ()
with {}
.
BEGIN;
CREATE FUNCTION resolve_type_name(want_type text)
RETURNS text
LANGUAGE plpgsql
AS $$
DECLARE
typmodin_arg cstring[];
typmodin_func regproc;
typmod int;
BEGIN
IF want_type::regtype = 'interval'::regtype THEN
RAISE NOTICE 'cannot resolve: %', want_type; -- TODO
RETURN want_type;
END IF;
-- Extract type modifier from type declaration and format as cstring[] literal.
typmodin_arg := translate(substring(want_type FROM '[(][^")]+[)]'), '()', '{}');
-- Find typmodin function for want_type.
SELECT typmodin
INTO typmodin_func
FROM pg_type
WHERE oid = want_type::regtype;
IF typmodin_func = 0 THEN
-- Easy: types without typemods.
RETURN format_type(want_type::regtype, 0);
ELSE
-- Get typemod via type-specific typmodin function.
EXECUTE format('SELECT %I(%L)', typmodin_func, typmodin_arg) INTO typmod;
RETURN format_type(want_type::regtype, typmod);
END IF;
END $$;
CREATE EXTENSION IF NOT EXISTS postgis;
SELECT
want_type,
resolve_type_name(want_type)
FROM (
VALUES ('int'), ('int4'), ('integer'),
('float'), ('double precision'),
('real'), ('float4'),
('varbit(23)'), ('bit varying (42)'),
('timestamptz (3)'), ('timestamp (3) with time zone'),
('interval second (0)'),
('geometry(point, 4326)')
) t (want_type);
ROLLBACK;
BEGIN
CREATE FUNCTION
psql:typname.sql:36: NOTICE: extension "postgis" already exists, skipping
CREATE EXTENSION
psql:typname.sql:49: NOTICE: cannot resolve: interval second (0)
want_type | resolve_type_name
------------------------------+-----------------------------
int | integer
int4 | integer
integer | integer
float | double precision
double precision | double precision
real | real
float4 | real
varbit(23) | bit varying(23)
bit varying (42) | bit varying(42)
timestamptz (3) | timestamp(3) with time zone
timestamp (3) with time zone | timestamp(3) with time zone
interval second (0) | interval second (0)
geometry(point, 4326) | geometry(Point,4326)
(13 rows)
ROLLBACK
But it does not solve interval [fields] (p)
. The problem here is that Postgres already resolves fields
to a bitmask when parsing the type declaration. That bitmask is then passed to intervaltypmodin()
. I included geometry(point, 4326)
above to show that typmodin
functions certainly can accept arbitrary strings.
I assume those bitmasks won't change but pgTAP parsing interval [fields] (p)
and constructing the bitmask would be really hackish. For example, second
is 1 << 12
:
SELECT intervaltypmodout(intervaltypmodin('{4096,0}'));
intervaltypmodout
-------------------
second(0)
(1 row)
Oh very nice. So close! What function changes interval second(0)
to {4096,0}
, though? Is it only interval types that have this kind of intermediary processing? Some sort of special casing in the source?
The bitmask translation happens in the grammar (expressions opt_interval
and interval_second
). I think interval
is the only type with this special handling on the grammar level where the typmod is based on keywords.
But we don't even have to deal with it on that level because I've found function parseTypeString()
which should handle all cases. I can use that in an extension function written in C. Just writing a PoC now. My first extension in C!. I'll let you know.
Function exists back in 9.1, so it would work for all supported versions.
Here's my PoC: ewie/type_parser. Just tested on pg15.
Writing the C extension was easier than I expected. The C code may not be perfect though. Not sure if I have to free any memory (especially the result of text_to_cstring()
) or if Postgres always frees it automatically. I checked some of the extension code in contrib
but haven't found anything that strikes me.
Super cool, @ewie. The tests failed for me on 16, I suspect because sort ordering changes with ICU:
@@ -103,14 +103,14 @@
text[][] | text[]
time | time without time zone
time (3) with time zone | time(3) with time zone
+ time with time zone | time with time zone
+ time without time zone | time without time zone
timestamp | timestamp without time zone
- timestamp (3) without time zone | timestamp(3) without time zone
timestamp (3) with time zone | timestamp(3) with time zone
+ timestamp (3) without time zone | timestamp(3) without time zone
timestamptz (3) | timestamp(3) with time zone
timetz | time with time zone
timetz (3) | time(3) with time zone
- time without time zone | time without time zone
- time with time zone | time with time zone
varbit (23) | bit varying(23)
varchar (42) | character varying(42)
(51 rows)
Probably best fixable by adding an integer column or something.
The docs for text_to_cstring() say that the returned value is palloc
d, so presumably it must be pfree
d.
Would you mind if I swiped some of this code for pgTAP?
I take it back, pfree
does not need to be called. See, e.g., to_regtype.
The tests failed for me on 16, I suspect because sort ordering changes with ICU:
Collation C should give a stable ordering. My regression database was created with en_US.UTF-8.
SELECT
t.type_str,
format_type
FROM t
CROSS JOIN parse_type_string(t.type_str) p
CROSS JOIN format_type(p.typid, p.typmod)
ORDER BY
t.type_str COLLATE "C",
format_type COLLATE "C";
I take it back, pfree does not need to be called. See, e.g., to_regtype.
I was about to write the exact same thing after looking at to_regtype
;) I vaguely remember that the memory context deals with that. But I guess there are some cases where pfree
is necessary, otherwise it wouldn't be part of the API.
Would you mind if I swiped some of this code for pgTAP?
Not at all! Be my guest.
Thanks David! I learnt a lot of Postgres internals with this fix. (Sorry for the late reply, I haven't checkrf GitHub in a while.)
I'm no longer with Amazon, but I wouldn't expect them to balk at this bit of C code.
FWIW, I believe the only safe way to handle typmod is via the in/out functions (though maybe there's a few types that are excluded, like varchar()
).
Is there a reason why col_type_is()
can't compare the output of format_type()
?
I realize there's a shitload of history go to through here, @decibel, but the upshot is that there was no way to reliably have Postgres format types where words are part of the typed. This comment is where I realized that.
We could perhaps use some variant of this hack that @ewie came up with, but I liked being able to get it directly from Postgres better. If only parseTypeString()
was exposed in SQL!
Oh wow, didn't realize interval typmod processing was part of the grammar. UGH.
As for me, I took a break and just started job hunting a week or two ago, so not sure what's next yet!