Searching by Language Families - select pull down menu doesn't show the number of involved societies
Closed this issue · 11 comments
@xrotwang This issue is caused by the fact that in SQL table languagefamily
column language_count
wasn't updated after the import. The model class LanguageFamily
defines such a function:
https://github.com/D-PLACE/dplace/blob/master/dplace_app/models.py#L300
What do you think, would it be good to call this routine before:
https://github.com/D-PLACE/dplace/blob/master/dplace_app/loader/glottocode.py#L24
or do you see a better place? I'm thinking of a general place where all necessary table updates can be called.
Do it directly in the query rather than precomputing? It's not going to save that much database effort, surely?
from django.db.models import Count
qset = LanguageFamily.objects.all().annotate(language_count=Count('language'))
yes - one option, but we're dealing with static data after loading all data sets thus it'd be better to pre-calculate as much as possible to simplify and speed up the code.
I'm not sure that it's always better to pre-calculate. The trade-off is increased complexity of the loading code - I obviously missed this bit when refactoring.
hmm - so far we've 186 language families and to do 186 times a count to open a select menu - hmm - maybe one could do it in such a way that only the first user after a fresh load will trigger the update which will save the counts in the database for the next call.
@Bibiko But I think you are right in this case. Throwing in a line
for family in families.values():
family.update_counts()
at the end of load_languages
should do the trick.
Well, if one wants to this dynamically for all families, a single group by
query would give all numbers.
yes - but I would go with the update in load_languages
.
Just did a quick test: Looping over all LanguageFamily
s including the society count like this
a = 0
for f in models.LanguageFamily.objects.all().annotate(language_c=Count('language__societies')):
a += f.language_c
print(a)
clocks in at < 0.02 sec. In some runs it was up to 50% slower than looping without the count (which runs between 0.012 and 0.016 sec). Overall I'd say: Do it dynamically.
After all,
premature optimization ...
I we factor in that Glottolog is now the only language classification scheme we use, this will cut the LanguageFamily
model down from
class LanguageFamily(models.Model):
scheme = models.CharField(max_length=1, choices=CLASSIFICATION_SCHEMES, default='G')
name = models.CharField(max_length=50, db_index=True)
language_count = models.IntegerField(default=0, null=False)
def update_counts(self):
self.language_count = 0
for society in Society.objects.all().filter(language__family=self):
if society.value_set.count() > 0:
self.language_count += 1
self.save()
to
class LanguageFamily(models.Model):
name = models.CharField(max_length=50, db_index=True)