model-bakers/model_bakery

Use bulk_create for related objects with PostgreSQL, MariaDB 10.5+ and SQLite 3.35+

richardebeling opened this issue · 0 comments

In #206, bulk creation of related objects was introduced. However, for _quantity=N, it will cause N calls to save() of the related class, causing N database queries. This comment claimed this was necessary to retrieve the IDs of the related objects.

According to the current django documentation, bulk_create will update the primary key of the objects if the field type is AutoField and the database used is PostgreSQL, MariaDB 10.5+ or SQLite 3.35+. I think this covers the majority of use cases. It would be nice if users could profit from the performance of bulk_create in these cases.

For implementing, django's can_return_rows_from_bulk_insert could be used to test whether this would be supported.

Expected behavior

with

class RelatedModel(Model):
    pass

class MainModel(models.Model):
    related = ForeignKey(RelatedModel)

I'd expect

N = 1000
baker.make(MainModel, _quantity=N, _bulk_create=N)

to execute O(1) instead of O(N) database queries.

Actual behavior

def test_bulk_create_multiple_fk(self):
with self.assertNumQueries(6):
baker.make(models.PaymentBill, _quantity=5, _bulk_create=True)
assert models.PaymentBill.objects.all().count() == 5
assert models.User.objects.all().count() == 5

asserts that 6 = N+1 database queries happen.


On a side note: The documentation gives this workaround:

If you want to avoid that, you’ll have to perform individual bulk creations per foreign keys as the following example:

from model_bakery import baker

baker.prepare(User, _quantity=5, _bulk_create=True)
user_iter = User.objects.all().iterator()
baker.prepare(Profile, user=user_iter, _quantity=5, _bulk_create=True)
  • I think the calls are supposed to be baker.make calls instead of baker.prepare(?)
  • If this works (it does for me), doesn't this falsify the claim that the save() calls are needed to retrieve the IDs? Note that it does not specify any requirements on the database or django version used. Edit: I just saw it iterates all objects, not just the ones that were created.