django-bulk-update
Simple bulk update over Django ORM or with helper function.
This project aims to bulk update given objects using one query over Django ORM.
Installation
pip install django-bulk-update
Usage
With manager:
import random
from django_bulk_update.manager import BulkUpdateManager
from tests.models import Person
class Person(models.Model):
...
objects = BulkUpdateManager()
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
person.name = random.choice(random_names)
Person.objects.bulk_update(people, update_fields=['name']) # updates only name column
Person.objects.bulk_update(people, exclude_fields=['username']) # updates all columns except username
Person.objects.bulk_update(people) # updates all columns
Person.objects.bulk_update(people, batch_size=50000) # updates all columns by 50000 sized chunks
With helper:
import random
from django_bulk_update.helper import bulk_update
from tests.models import Person
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
person.name = random.choice(random_names)
bulk_update(people, update_fields=['name']) # updates only name column
bulk_update(people, exclude_fields=['username']) # updates all columns except username
bulk_update(people, using='someotherdb') # updates all columns using the given db
bulk_update(people) # updates all columns using the default db
bulk_update(people, batch_size=50000) # updates all columns by 50000 sized chunks using the default db
Note: You can consider to use .only('name')
when you only want to update name
, so that Django will only retrieve name data from db.
And consider to use .defer('username')
when you don't want to update username
, so Django won't retrieve username from db.
These optimization can improve the performance even more.
Performance Tests:
Here we test the performance of the bulk_update
function vs. simply calling
.save()
on every object update (dmmy_update
). The interesting metric is the speedup using
the bulk_update
function more than the actual raw times.
# Note: SQlite is unable to run the `timeit` tests
# due to the max number of sql variables
In [1]: import os
In [2]: import timeit
In [3]: import django
In [4]: os.environ['DJANGO_SETTINGS_MODULE'] = 'tests.test_settings'
In [5]: django.setup()
In [6]: from tests.fixtures import create_fixtures
In [7]: django.db.connection.creation.create_test_db()
In [8]: create_fixtures(1000)
In [9]: setup='''
import random
from django_bulk_update import helper
from tests.models import Person
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
ids = list(Person.objects.values_list('id', flat=True)[:1000])
people = Person.objects.filter(id__in=ids)
for p in people:
name = random.choice(random_names)
p.name = name
p.email = '%s@example.com' % name
bu_update = lambda: helper.bulk_update(people, update_fields=['name', 'email'])
'''
In [10]: bu_perf = min(timeit.Timer('bu_update()', setup=setup).repeat(7, 100))
In [11]: setup='''
import random
from tests.models import Person
from django.db.models import F
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
ids = list(Person.objects.values_list('id', flat=True)[:1000])
people = Person.objects.filter(id__in=ids)
def dmmy_update():
for p in people:
name = random.choice(random_names)
p.name = name
p.email = '%s@example.com' % name
p.save(update_fields=['name', 'email'])
'''
In [12]: dmmy_perf = min(timeit.Timer('dmmy_update()', setup=setup).repeat(7, 100))
In [13]: print 'Bulk update performance: %.2f. Dummy update performance: %.2f. Speedup: %.2f.' % (bu_perf, dmmy_perf, dmmy_perf / bu_perf)
Bulk update performance: 7.05. Dummy update performance: 373.12. Speedup: 52.90.
Requirements
- Django 1.8+
Contributors
- aykut
- daleobrien
- sruon
- HowerHell
- c-nichols
- towr
- joshblum
- luzfcb
- torchingloom
- cihann
- wetneb
- tatterdemalion
- gabriel-laet
- arnau126
TODO
- Geometry Fields support
License
django-bulk-update is released under the MIT License. See the LICENSE file for more details.