SectorLabs/django-postgres-extra

Temporary table support

Rjevski opened this issue · 0 comments

Hello,

I would like to open a discussion of whether we would like to add support for temporary table management in this package, and if so how we would do it.

For context, in the application I'm currently working on, we need to provide a large amount of data to an SQL query. This is potentially too large for a single SQL query, and in either case, composing such a query will be quite resource-intensive.

The pattern we've settled on is to create a temporary table, load the data into it, and then do a (data-less) query which uses that temporary table as part of its predicates (either by JOINing or using it within a IN subquery).

I see that this package already provides temporary schema management, but this is quite low-level and forces you to use raw SQL explicitly - there is no ORM support for those schemas/tables. It's quite impractical to then be limited to purely raw SQL so I am not using the existing temporary schema functions and instead:

  • override the SchemaEditor of the DB backend to provide a new create_temporary_model method. This is pretty much a copy of the Django create_model function, except it uses the temporary_table_sql method to get the SQL to execute.
  • provide the temporary_table_sql method, which is a copy of the Django table_sql method except it uses self.sql_create_temporary_table as its template for the SQL.
  • the sql_create_temporary_table template is defined as sql_create_temporary_table = "CREATE TEMPORARY TABLE %(table)s (%(definition)s)"

Then, a utility context manager is provided which takes an abstract model class, and on __enter__:

  • it subclasses the abstract model to assign it a random table name
  • it then gets the (previously overridden) schema editor to call those temporary table functions to create the temporary table for said model - this does what it would normally do in a migration and generates & executes the SQL needed to create the table from the model
  • finally it returns this new subclassed model which can now be used as any other Django model, allowing ORM access to it if needed (so you are no longer limited to exclusively using raw SQL to interact with it)

On __exit__ it deletes the temporary table, though that isn't strictly required as it will eventually be deleted when the DB connection gets recycled.

This implementation thus allows an application to have abstract models that aren't part of the migration graph and are only used as templates for temporary tables created at runtime, allowing both raw SQL and ORM access to them, with random table name generation and cleanup handled automatically as part of the context manager.

This is something I can submit a PR for (it would be beneficial for me as it would avoid me having to maintain those subclasses in my project) but I wanted to raise this discussion to see what others think of such an implementation.

Potential caveats:

  • although the subclassed model looks and behaves like a normal Django model, it is backed by a temporary table so certain functionality such as foreign keys might not be available or yield undefined behavior. There are no guardrails to explicitly detect and prevent "dangerous" features from being used, and those guardrails would be time-consuming to implement - therefore this should be documented as an advanced/experimental feature with less "polish" than the rest of the package.
  • the schema editor subclass effectively duplicates and copy/pastes the create_model and table_sql methods from Django - those methods are fairly large so would need to be kept up to date with any upstream Django changes (unfortunately there isn't a way to make any more granular changes to them).
  • the context manager subclassing the models at runtime and interacting with the schema editor isn't part of the DB backend import path so will need to find a suitable place to put it. This may not be an issue as I think this package already has utility symbols that aren't part of the DB backend and are meant to be imported into runtime code.

Any thoughts? Would this be an acceptable pattern to add to this package (if so I can PR it), and if not, what would be the desired pattern? I would appreciate for this package to provide some ORM-friendly way to deal with temporary tables so I can ditch the custom subclasses I have.