Indicia-Team/warehouse

Deleted samples present in ElasticSearch

Closed this issue · 3 comments

Samples e.g. with id 8029261, 11868076, 11966510, 18588200 are all marked deleted.
They are not represented in the ElasticSearch occurrence index but they are present in the ElasticSearch sample index.

Need to

  1. confirm records are removed from the sample index when deleted
  2. update the sample index to remove deleted records which should not be there.

This same issue has raised its head again in BiologicalRecordsCentre/ABLE#546

I've tracked it down to the Logstash configuration.

In samples-http-indicia.conf each new record is given a unique id which is document_id => "iBRCSMP%{id}"
In samples-http-indicia-deletions.conf we seek to delete records with document_id => "brc1|%{id}"

I will

  • update the document_id in samples-http-indicia-deletions.conf to be the same as in samples-http-indicia.conf
  • restart the deletion process so that it scans the entire index by deleting the rest-autofeed-BRCSMPDEL record from the variables table.

Note this only affects the sample index, not the occurrence index, so most reports/downloads are not affected.

Note

  • I've decreased the period of the request for sample deletion from 15 minutes to 5 so that scan of the whole index completes more quickly.
  • The Logstash service has to be restarted for config changes to apply.

Successfully completed and working as expected.