D-PLACE/dplace-legacy

Searching by Language Families - select pull down menu doesn't show the number of involved societies

Closed this issue · 11 comments

@xrotwang This issue is caused by the fact that in SQL table languagefamily column language_count wasn't updated after the import. The model class LanguageFamily defines such a function:
https://github.com/D-PLACE/dplace/blob/master/dplace_app/models.py#L300

What do you think, would it be good to call this routine before:
https://github.com/D-PLACE/dplace/blob/master/dplace_app/loader/glottocode.py#L24

or do you see a better place? I'm thinking of a general place where all necessary table updates can be called.

Do it directly in the query rather than precomputing? It's not going to save that much database effort, surely?

from django.db.models import Count
qset = LanguageFamily.objects.all().annotate(language_count=Count('language'))

yes - one option, but we're dealing with static data after loading all data sets thus it'd be better to pre-calculate as much as possible to simplify and speed up the code.

I'm not sure that it's always better to pre-calculate. The trade-off is increased complexity of the loading code - I obviously missed this bit when refactoring.

hmm - so far we've 186 language families and to do 186 times a count to open a select menu - hmm - maybe one could do it in such a way that only the first user after a fresh load will trigger the update which will save the counts in the database for the next call.

@Bibiko But I think you are right in this case. Throwing in a line

for family in families.values():
    family.update_counts()

at the end of load_languages should do the trick.

Well, if one wants to this dynamically for all families, a single group by query would give all numbers.

yes - but I would go with the update in load_languages.

Just did a quick test: Looping over all LanguageFamilys including the society count like this

    a = 0
    for f in models.LanguageFamily.objects.all().annotate(language_c=Count('language__societies')):
        a += f.language_c
    print(a)

clocks in at < 0.02 sec. In some runs it was up to 50% slower than looping without the count (which runs between 0.012 and 0.016 sec). Overall I'd say: Do it dynamically.

After all,

premature optimization ...

I we factor in that Glottolog is now the only language classification scheme we use, this will cut the LanguageFamily model down from

class LanguageFamily(models.Model):
    scheme = models.CharField(max_length=1, choices=CLASSIFICATION_SCHEMES, default='G')
    name = models.CharField(max_length=50, db_index=True)
    language_count = models.IntegerField(default=0, null=False)

    def update_counts(self):
        self.language_count = 0
        for society in Society.objects.all().filter(language__family=self):
            if society.value_set.count() > 0:
                self.language_count += 1
        self.save()

to

class LanguageFamily(models.Model):
    name = models.CharField(max_length=50, db_index=True)