techfromsage/tripod-php

Move view/table/search invalidation workflow out of saveChanges

Opened this issue · 3 comments

Currently when Tripod saves a changeset it:

  1. Determines all of the subjects that have been changed and each subject's changed predicates
  2. Stores the changes to CBD collections
  3. Fetches all of the updated CBD documents for the subjects of change
  4. Gathers all of the RDF types that trigger view, table, or search specfications
  5. Loops through all of the CBD documents and checks to see if the rdf:types of the resource intersect with the rdf types of the views, tables, and search documents (in that order)
  • If they do, add the resource ID and the operation in question (OP_VIEWS, OP_TABLES, OP_SEARCH) to the array of operations to perform
    1. Passes the array of operations generated from the last step to getOperationsForImpactedData which
  • Queries the views collections for any appearance of the subjects of change in the impact index
  • For tables and search:
    1. compile all of the predicates defined in the table/search spec and determine if there's any overlap. If either the spec predicates or the resource predicates are empty, assume one or the other has been removed and all instances of the resource should be removed from that spec.
    2. Query the tables/search collections for any appearance of the subjects of change in the impact indexes
    3. Loop through the array of operations to perform and create a ModifiedSubject object for each task and divvy them up into arrays of synchronous and asynchronous jobs.
    4. Execute the synchronous tasks, push the asynchronous ones to the processQueue

It seems like saveChanges should only concern itself with 1 & 2. The methods to determine the affected views, tables, and search should be moved to MongoTripodViews, MongoTripodTables, and MongoSearchProvider accordingly.

Obviously, we still need to be able to address synchronous operations.

The information needed to be able to do this successfully are:

  1. The subjects and predicates of change
  2. A list of deleted subjects
  3. Pod name
  4. Store name

Can you provide any background info on why we would want to do this?

Good question :)

The status quo is problematic for 2 main reasons:

  1. Given that these operations can happen across multiple datasources (per #33), if tripod was unable to reach one of those datasources during a save, the save throws an exception. The save to the CBD collection is ok, but our views, tables, etc. are in an unclean state.
  2. Realistically, saveChanges() is doing too much: it's not just storing the changes to the CBD graph. It's saving the graph, looking up a bunch of other documents, looking up views, table rows for invalidation, etc. At the very least this should be uncoupled from saveChanges() and moving the respective view, table, and search logic into their respective delegates would be an added bonus from a developer POV.

I spent a little time trying to sort the problem f8f2bf8, but that didn't get me terribly far - it was mainly to reacquaint myself with the workflow.

The biggest takeaway was that the current design creates a bunch of ModifiedSubjects that either get processed synchronously, or placed on the queue.

I think we need to break this up into two (or possibly more, depending on how you break this up) tasks that can be done synchrously or asynchronously, depending on what's been passed to the Tripod constructor:

  1. Determine what needs to be invalidated (so, by default, this process would be done synchronously for views, and asynchronously for tables and search)
  2. Do the invalidation and regeneration (again, by default synchronously for the views and asynchronously for the tables and search).

So it makes sense for 1) to be able to schedule 2), but there can't be an assumption that 2) will be a background task.