simonw/django-sql-dashboard

If a column is called "group" (or other reserved words) the example link to the table doesn't work

simonw opened this issue · 11 comments

Clicking this link:

SQL__select_count____from_availability_tag_____select_id__name__notes__disabled__previous_names__slug__group_from_availability_tag

Links to this SQL which throws an error:

select id, name, notes, disabled, previous_names, slug, group from availability_tag

Error:

syntax error at or near "group" LINE 1: ... id, name, notes, disabled, previous_names, slug, group from... ^

Fix is to wrap the column name in double quotes:

select id, name, notes, disabled, previous_names, slug, "group" from availability_tag

Need a list of reserved SQL words - or some other mechanism for detecting them, maybe psycopg2 has something?

https://www.psycopg.org/docs/extensions.html#psycopg2.extensions.quote_ident

psycopg2.extensions.``quote_ident(str, scope)

Return quoted identifier according to PostgreSQL quoting rules.

The scope must be a connection or a cursor, the underlying connection encoding is used for any necessary character conversion.

That quotes identifiers whether or not they need to be quoted, which I find a little ugly. It gave me this result:

select "id", "name", "notes", "disabled", "previous_names", "slug", "group" from availability_tag

I'm optimistic about this to help solve the problem:

select * from pg_get_keywords()

It looks like the full list of reserved words (that are invalid if used as column names without double quotes) can be found with:

select string_agg(word, ', ') from pg_get_keywords() where catcode = 'R'

It's:

all, analyse, analyze, and, any, array, as, asc, asymmetric, both, case, cast, check, collate, column, constraint, create, current_catalog, current_date, current_role, current_time, current_timestamp, current_user, default, deferrable, desc, distinct, do, else, end, except, false, fetch, for, foreign, from, grant, group, having, in, initially, intersect, into, lateral, leading, limit, localtime, localtimestamp, not, null, offset, on, only, or, order, placing, primary, references, returning, select, session_user, some, symmetric, table, then, to, trailing, true, union, unique, user, using, variadic, when, where, window, with

I could incorporate that into this SQL query in a clever way:

tables_cursor.execute(
"""
with visible_tables as (
select table_name
from information_schema.tables
where table_schema = 'public'
order by table_name
)
select
information_schema.columns.table_name,
string_agg(column_name, ', ' order by ordinal_position) as columns
from
information_schema.columns
join
visible_tables on
information_schema.columns.table_name = visible_tables.table_name
where
information_schema.columns.table_schema = 'public'
group by
information_schema.columns.table_name
order by
information_schema.columns.table_name
"""
)

Or I could run it once at the start, pull the list of reserved words into Python and apply optional escaping in Python code.

Doing it with fancy SQL looks like it could work really well:

with visible_tables as ( 
  select table_name 
    from information_schema.tables 
    where table_schema = 'public' 
    order by table_name 
),
reserved_keywords as (
  select word
    from pg_get_keywords()
    where catcode = 'R'
  union select 'id' as word
)
select 
  information_schema.columns.table_name, 
  string_agg(column_name, ', ' order by ordinal_position) as columns,
  array_agg(case when column_name in (select word from reserved_keywords) then '"' || column_name || '"' else column_name end order by ordinal_position) as columns 
from 
  information_schema.columns 
join 
  visible_tables on 
  information_schema.columns.table_name = visible_tables.table_name 
where 
  information_schema.columns.table_schema = 'public' 
group by 
  information_schema.columns.table_name 
order by 
  information_schema.columns.table_name

(I added the union select 'id' as word bit just to ensure I had some columns that were treated as needing quoting)

Output:

SQL__with_visible_tables_as___select_table_name_from_information_schema_tables_where_table_schema____public__order_by_table_name____reserved_keywords_as___select_word_from_pg_get_keywords___where_catcode____R____select_information_schema_co

I also need to solve this for "sort by column" queries, see #57 - so I'm going to load the results of pg_get_keywords() once (and cache for the lifetime of the server) and implement this in Python instead.

Python 3.9 introduced a @functools.cache decorator - https://github.com/python/cpython/blob/3.9/Lib/functools.py#L650 - but under the hood it looks like this, and lru_cache has been in Python since 3.2:

def cache(user_function, /):
    'Simple lightweight unbounded cache.  Sometimes called "memoize".'
    return lru_cache(maxsize=None)(user_function)
In [3]: @functools.lru_cache(maxsize=None)
   ...: def cached():
   ...:     print("called")
   ...:     return 5
   ...: 

In [4]: 

In [4]: cached()
called
Out[4]: 5

In [5]: cached()
Out[5]: 5

That won't work because the function needs to be passed a connection. I'll roll my own dumb caching mechanism instead.

_reserved_words = None

def postgresql_reserved_words(connection):
    global _reserved_words
    if _reserved_words is None:
        with connection.cursor() as cursor:
            cursor.execute("select word from pg_get_keywords() where catcode = 'R'")
            _reserved_words = [row[0] for row in cursor.fetchall()]
    return _reserved_words