Use bulk_create for related objects with PostgreSQL, MariaDB 10.5+ and SQLite 3.35+
richardebeling opened this issue · 0 comments
In #206, bulk creation of related objects was introduced. However, for _quantity=N
, it will cause N calls to save()
of the related class, causing N database queries. This comment claimed this was necessary to retrieve the IDs of the related objects.
According to the current django documentation, bulk_create
will update the primary key of the objects if the field type is AutoField
and the database used is PostgreSQL, MariaDB 10.5+ or SQLite 3.35+. I think this covers the majority of use cases. It would be nice if users could profit from the performance of bulk_create in these cases.
For implementing, django's can_return_rows_from_bulk_insert
could be used to test whether this would be supported.
Expected behavior
with
class RelatedModel(Model):
pass
class MainModel(models.Model):
related = ForeignKey(RelatedModel)
I'd expect
N = 1000
baker.make(MainModel, _quantity=N, _bulk_create=N)
to execute O(1) instead of O(N) database queries.
Actual behavior
model_bakery/tests/test_baker.py
Lines 386 to 391 in 850aadc
asserts that 6 = N+1 database queries happen.
On a side note: The documentation gives this workaround:
If you want to avoid that, you’ll have to perform individual bulk creations per foreign keys as the following example:
from model_bakery import baker baker.prepare(User, _quantity=5, _bulk_create=True) user_iter = User.objects.all().iterator() baker.prepare(Profile, user=user_iter, _quantity=5, _bulk_create=True)
- I think the calls are supposed to be
baker.make
calls instead ofbaker.prepare
(?) If this works (it does for me), doesn't this falsify the claim that theEdit: I just saw it iterates all objects, not just the ones that were created.save()
calls are needed to retrieve the IDs? Note that it does not specify any requirements on the database or django version used.