jeffdc/gallformers

Investigate and address data loss

Megachile opened this issue · 2 comments

As of April 18, users noticed that all species and genera in the family Tephritidae no longer appeared in search results within the site or in ID tool query outputs, or on the pages of hosts associated with those galls. Comparing the database before and after the data loss reveals that the genera and species records are in fact deleted in the database.

Strangely, the gall pages (which are looked up via the species IDs, not gall IDs, in the db) still load through google or if you have the right URL--this may be associated with a cache phenomenon given that the species doesn't exist in the database?

Goals here would be to identify the cause if possible, and prevent it from recurring, and if not to prevent it, then to design a system that will allow us to notice and conveniently address it when it happens again. That may involve 1) some kind of log of deletions or 2) simply a data backup and discrepancy alert system. If we decide to move forward with one or both of those mechanisms we would create new issues accordingly.

Upon further investigation, it looks like what happened should be consistent with this Admin user action:

On the Taxonomy tab of the admin UI, typing in and selecting family Tephritidae, then selecting all the genera (there used to be 13) and clicking delete:

image

This would theoretically have generated 13 queries to the db, one for each genus. Those queries have two parts, one that deletes all the species in the genus, and one that deletes the genus itself. This matches what happened in that the family itself still exists, but contains no genera.

image

The major question with this hypothesis (admin action) is that theoretically it should also have triggered the cache to update and remove the associated pages, which did not occur. Instead, they are missing from search but still load.

We also need to restore the original data at some point--not sure if that should be a separate issue?