Some ruby/sparql code to construct an adjacency graph of islandora objects.
The idea is to list all of the Islandora objects and their interesting properties: parents, models and state (:active, :deleted, and :inactive) and build an adjacency data structure which describes the islandora object hierarchy (there may be cycles, but that's bad - it should, strcitly, be a tree). We add some other states: :missing and :loop, for problem childrens.
Given that data, we can find orphaned and missing objects, and other degenerate conditions such as the aforementioned loops. No kidding.
We need the model data in the form of a CSV file with "PID,ContentModel,ObjectState".
The child/parent CSV file is "Child-PID,Parent-PID". There can of course be multiple entries for a give "Child-PID"
In the lib/ folder there are two files that give sparql queries: run them as so:
scripts/ri-query lib/model-states.sparql > model-states.csv
scripts/ri-query lib/parents.sparql > parents.csv
The ri-query
program uses localhost:8080
to find the Islandora RDF endpoint: normally, this means you need to execute the ri-timer
script on the Islandora server.
scripts/parse-foxml [ FoXML-files+ ]
This later method is a work in progress. It's also much slower than the above.
Then use the utility scripts:
scripts/list-orphans model-states.csv parents.csv | sort > orphans.out
scripts/list-lineages model-states.csv parents.csv | sort > lineages.out
Check the data in the out
files and fix inconsistencies.
Each step may take several minutes (at the time of this writing we have > 800,000 objects): this is not an interactive process.