isamplesorg/isamples_inabox

GEOME partial solr index updates

Closed this issue · 4 comments

To properly do partial solr index updates for GEOME, we need to change the way the modified date is plumbed through. The existing code assumes that we are able to infer the last modified date via the source JSON document via

    def last_updated_time(self) -> typing.Optional[typing.AnyStr]:
        """Return the time the record was last modified in the source collection"""

an example implementation from OpenContext is

    def last_updated_time(self) -> typing.Optional[typing.AnyStr]:
        return self.source_record.get("updated", None)

this then writes to the solr document as coreMetadata["sourceUpdatedTime"] = datetimeToSolrStr(date_time). We then go fetch the max value for this when we start up the solr indexer, and we limit our solr index updates to to dates in the db with tcreated > this value (this is the most precise way to account for any jigger between the two data stores any discrepancies therein).

This value isn't available in the GEOME source JSON as it lives at the Expedition level, so we need to pass it through to the Transformer from the Thing record at the point of creation.

While doing this work, we should also do a sanity check on the last modified date in the GEOME records -- I noticed some previously futuristic dates that I manually updated by hand when I ran the partial import today, for instance.

The value is now populated in the transformer, we just need to wire it in to the Things script. I think this will just be a couple lines of code.

So, this is legitimately not complete.

This is done now.