digitalfabrik/integreat-cms

Management command for finding links in latest version of translations

Closed this issue · 2 comments

Motivation

After a content migration, all links need to be checked once initially to check which ones of them are broken.
We use django-linkcheck for this task, which provides the management command integreat-cms-cli findlinks. The only problem is that this command searches all links in all versions of all translations, which means that the command takes very long to run on large databases (and we might run into memory bottlenecks). Since we only show links that belong to the newest versions in the broken link checker, we should not waste resources on searching links that are invisible in the CMS.

Proposed Solution

Instead, we should limit the findlinks command to only the newest versions of the translations.

Alternatives

Just increase the hardware specifications of the database server (both RAM and hard disk size) and wait for the normal command to finish...

Additional Context

The current implementation can be found here:
https://github.com/DjangoAdminHackers/django-linkcheck/blob/7c4d174e0b278e6e42e9a189324b200229ef33ba/linkcheck/utils.py#L142-L174

We could also define object_filter/object_exclude on our page_translation linkchecker model, so that all older versions are ignored: https://github.com/DjangoAdminHackers/django-linkcheck/blob/7c4d174e0b278e6e42e9a189324b200229ef33ba/linkcheck/__init__.py#L108-L123
Unfortunately I think we would need an extra column in our page_translation model indicating the most recent version like object_filter = {'active': True}

@melegiul I like your idea, let's wait for DjangoAdminHackers/django-linkcheck#114 before we proceed with this issue.