/findingaids_eads

Data repository of EAD XML files for Special Collections Finding Aids

Special Collections Finding Aids EAD Repository

This repository contains a live cut of the Archivists' Toolkit-exported EADs describing finding aids from various NYU-associated special collections. This data is harvested into a Solr index for the finding aids discovery portal.

The EAD publishing tool has a hook to push out changes here, therefore maintaining one up-to-date repository of EADs.

Polling Changes and Reindexing

Reindexing Changed Files In Real-time ![Build Status](http://jenkins.library.nyu.edu:8080/buildStatus/icon?style=flat&job=Finding Aids Production Solr Reindex Changed)

A Jenkins job is triggered every time this index is updated. This job calls a reindex task which gets the previous commit and updates the Solr index with all the changed files.

Reindexing Changed Files Nightly ![Build Status](http://jenkins.library.nyu.edu:8080/buildStatus/icon?job=Finding Aids Production Nightly Solr Reindex Changed Cron&style=flat)

In addition to real-time updates we run a nightly job that reindexes any files changes in commits over the past 24 hours. This serves as a failsafe for any failed Jenkins rebuilds or Jenkins downtime.

Full Reindex ![Build Status](http://jenkins.library.nyu.edu:8080/buildStatus/icon?style=flat&job=Finding Aids Production Solr Full Reindex Cron)

There is also a automated job for running a full reindex in case things get out of sync. Note that these full reindex jobs can take up to 5 days.

Technical Note: This job actually calls a downstream job with the full finding aids rails environment and updates this EAD repository as a subfolder before running the reindex task.

This EAD repository is not included as a submodule in the finding aids project because I don't want the Jenkins trigger for that task to rebuild every time this repos is updated. Since the data is in a Solr index, keeping these repositories completely separate is a fine solution. However, when polling changes with git I want to have the full rails app environment handy to use the built-in solr_ead index updater.

Known problematic commit range

The commits within range 5a67a80 to bfb2f0e (inclusive) are known to corrupt the state of the repo on case-insensitive filesystems. Note that Macos and Windows filesystems default to case-insensitive. For full details on this issue, see Jira ticket DLFA-155: Duplicate finding aids and filename collision in findingaids_eads Github repo.

Reporting Issues

The special collections content owners can create issues here if they are relevant to data errors in the EADs.