Dynamic list subpartitioning (rough proposal)

Question

Dynamic list subpartitioning (rough proposal)

rgupta33 opened this issue 5 months ago · 0 comments

django-postgres-extra makes it easy to partition by a single foreign key field. It would be nice to add better support for subpartitioning. To walk through an example, we can start with this simple model:

class MyModel(...):
    id = UUIDField(..., primary_key=True)
    amount = DecimalField(...)
    
    # fields we want to partition by
    account = ForeignKeyField(...)
    sub_account = ForeignKeyField(...)

Partitioning by a single field

Partitioning by the account field is straightforward.

class MyModel(...):
    class PartitioningMeta:
         method = PostgresPartitioningMethod.LIST
         key = ["account_id"]
    
    id = UUIDField(..., primary_key=True)
    amount = DecimalField(...)
    
   # fields to partition by
    account = ForeignKeyField(...)
    sub_account = ForeignKeyField(...)

Whenever a new account is created, we call add_list_partition.

def create_account(...):
    account = Account.objects.create_model(...)
    
    with transaction.atomic():
        connection.schema_editor().add_list_partition(
            model=MyModel,
            name=account.id,
            values=[account.id]
        )

Current issues with subpartitioning

Now, say we want to partition each account partition based on the sub_account field.

class PartitioningMeta:
         method = PostgresPartitioningMethod.LIST
         key = ["account_id", "sub_account_id"]

Running python manage.py pgmakemigrations will not cause any problems, but when running that migration with python manage.py migrate, Postgres gives back an error: cannot use "list" partition strategy with more than one column. Essentially, we can only partition by one field at a time.

Subpartitioning proposal

Add `additional_pk_fields`

Since Postgres requires that the field we are partitioning on is part of the table's primary key, we should be able to pass in additional PK fields for the initial creation of the table.

class PartitioningMeta:
         method = PostgresPartitioningMethod.LIST
         key = ["account_id"]
         additional_pk_fields = ["sub_account_id"]

Alternatively, an easier but hackier approach would be to use first element of the key list as the top-level partitioning key but to use all the elements of the key list for the PK.

class PartitioningMeta:
         method = PostgresPartitioningMethod.LIST
         key = ["account_id", "sub_account_id"]

Either way, the generated SQL should contain:

A composite PK on id, account_id, and sub_account_id
PARTITION BY LIST (account_id)

Modify `create_partition_table_name` to reference partitions

def create_partition_table_name(self, model: Type[Model], name: str, existing_partition_fields: list[str] | None = None) -> str:
        params = [model._meta.db_table.lower(), *(existing_partition_fields or []), name.lower()]
        return "_".join(["%s"] * len(params))

Example

# returns "mymodel_acct1_subacct1"
create_partition_table_name(model=MyModel, name="subacct1", existing_partition_fields=["acct1"]

Modify `add_list_partition`

def add_list_partition(
        self,
        model: Type[Model],
        name: str,
        values: List[Any],
        existing_partition_fields: list[str] | None = None,
        next_partition_field: str | None = None,
        comment: Optional[str] = None,
    ) -> None:
        meta = self._partitioning_properties_for_model(model)

        table_name = self.create_partition_table_name(model, name, existing_partition_fields)

        params = [
            self.quote_name(table_name),
            self.quote_name(model._meta.db_table) + (("_" + "_".join(existing_partition)) if existing_partition else ""),
            ",".join(["%s" for _ in range(len(values))]),
            meta.method.upper() if next_partition_field is not None else None,
            self.quote_name(next_partition_field) if next_partition_field is not None else None,
        ]
        params = tuple([param for param in params if param is not None])

        sql = (self.sql_add_list_partition + (self.sql_partition_by if next_partition_field is not None else "")) % params

        with transaction.atomic():
            self.execute(sql, values)

            if comment:
                self.set_comment_on_table(table_name, comment)

Call add_list_partition with the additional inputs

def create_account(...):
    account = Account.objects.create_model(...)
    
    with transaction.atomic():
        connection.schema_editor().add_list_partition(
            model=MyModel,
            name=account.id,
            values=[account.id],
            next_partition_field="sub_account_id"
        )

def create_sub_account(...):
    sub_account = SubAccount.objects.create_model(...)
    
    with transaction.atomic():
        connection.schema_editor().add_list_partition(
            model=MyModel,
            name=sub_account.id,
            values=[sub_account.id],
            existing_partition_fields=["account_id"],
            # could pass `next_partition_field` if further subpartitioning is needed
        )