Dynamic list subpartitioning (rough proposal)
rgupta33 opened this issue · 0 comments
django-postgres-extra
makes it easy to partition by a single foreign key field. It would be nice to add better support for subpartitioning. To walk through an example, we can start with this simple model:
class MyModel(...):
id = UUIDField(..., primary_key=True)
amount = DecimalField(...)
# fields we want to partition by
account = ForeignKeyField(...)
sub_account = ForeignKeyField(...)
Partitioning by a single field
Partitioning by the account
field is straightforward.
class MyModel(...):
class PartitioningMeta:
method = PostgresPartitioningMethod.LIST
key = ["account_id"]
id = UUIDField(..., primary_key=True)
amount = DecimalField(...)
# fields to partition by
account = ForeignKeyField(...)
sub_account = ForeignKeyField(...)
Whenever a new account is created, we call add_list_partition
.
def create_account(...):
account = Account.objects.create_model(...)
with transaction.atomic():
connection.schema_editor().add_list_partition(
model=MyModel,
name=account.id,
values=[account.id]
)
Current issues with subpartitioning
Now, say we want to partition each account
partition based on the sub_account
field.
class PartitioningMeta:
method = PostgresPartitioningMethod.LIST
key = ["account_id", "sub_account_id"]
Running python manage.py pgmakemigrations
will not cause any problems, but when running that migration with python manage.py migrate
, Postgres gives back an error: cannot use "list" partition strategy with more than one column
. Essentially, we can only partition by one field at a time.
Subpartitioning proposal
Add additional_pk_fields
Since Postgres requires that the field we are partitioning on is part of the table's primary key, we should be able to pass in additional PK fields for the initial creation of the table.
class PartitioningMeta:
method = PostgresPartitioningMethod.LIST
key = ["account_id"]
additional_pk_fields = ["sub_account_id"]
Alternatively, an easier but hackier approach would be to use first element of the key
list as the top-level partitioning key but to use all the elements of the key
list for the PK.
class PartitioningMeta:
method = PostgresPartitioningMethod.LIST
key = ["account_id", "sub_account_id"]
Either way, the generated SQL should contain:
- A composite PK on
id
,account_id
, andsub_account_id
PARTITION BY LIST (account_id)
Modify create_partition_table_name
to reference partitions
def create_partition_table_name(self, model: Type[Model], name: str, existing_partition_fields: list[str] | None = None) -> str:
params = [model._meta.db_table.lower(), *(existing_partition_fields or []), name.lower()]
return "_".join(["%s"] * len(params))
Example
# returns "mymodel_acct1_subacct1"
create_partition_table_name(model=MyModel, name="subacct1", existing_partition_fields=["acct1"]
Modify add_list_partition
def add_list_partition(
self,
model: Type[Model],
name: str,
values: List[Any],
existing_partition_fields: list[str] | None = None,
next_partition_field: str | None = None,
comment: Optional[str] = None,
) -> None:
meta = self._partitioning_properties_for_model(model)
table_name = self.create_partition_table_name(model, name, existing_partition_fields)
params = [
self.quote_name(table_name),
self.quote_name(model._meta.db_table) + (("_" + "_".join(existing_partition)) if existing_partition else ""),
",".join(["%s" for _ in range(len(values))]),
meta.method.upper() if next_partition_field is not None else None,
self.quote_name(next_partition_field) if next_partition_field is not None else None,
]
params = tuple([param for param in params if param is not None])
sql = (self.sql_add_list_partition + (self.sql_partition_by if next_partition_field is not None else "")) % params
with transaction.atomic():
self.execute(sql, values)
if comment:
self.set_comment_on_table(table_name, comment)
Call add_list_partition with the additional inputs
def create_account(...):
account = Account.objects.create_model(...)
with transaction.atomic():
connection.schema_editor().add_list_partition(
model=MyModel,
name=account.id,
values=[account.id],
next_partition_field="sub_account_id"
)
def create_sub_account(...):
sub_account = SubAccount.objects.create_model(...)
with transaction.atomic():
connection.schema_editor().add_list_partition(
model=MyModel,
name=sub_account.id,
values=[sub_account.id],
existing_partition_fields=["account_id"],
# could pass `next_partition_field` if further subpartitioning is needed
)