/aspace-data-remediation

Files, tools, and plans for remediating data for migration to ArchivesSpace at Smith.

Primary LanguageRuby

aspace-data-remediation

Files, tools, and plans for remediating data for migration to ArchivesSpace at Smith.

The scope of the project is as follows:

Data cleanup and processing

  • single-level resource records (minor revisions to processor need to be applied) (EAD)
  • single-level resource records (cleanup complete) (InMagic resource records)
  • multi-level resource records (minor revisions to processor need to be applied) (EAD)
  • accession records
  • agent records
  • top containers (pushed out of scope -- more assessment and planning need to be undertaken)
  • reconciliation of subjects (to LCSH, AAT, and TGN) (pushed out of scope)
  • merge duplicate names and subjects (pushed out of scope)
  • normalization of dates (solution implemented)

Post records to prod

  • single-level resource records (EAD and InMagic resource records)
  • multi-level resource records (EAD)
  • accession records
  • agent records
  • top containers
  • container profiles (in process)
  • locations
  • classifications

Other

  • recommend solutions for item-level archival objects (images, AV, and objects) described out of context from parent collections (pushed out of scope)
  • training

ACCESSION RECORD SPECIFICATIONS

SCA accession JSON mapping

SCA accession data

SSC accession JSON mapping

SSC accession data

RESOURCE RECORD SPECIFICATIONS

SSC resource JSON mapping

SSC resource data

MRBC resource data identifiers

AGENT RECORD SPECIFICATIONS

SCA agent data JSON mapping

SCA faculty agent data

SCA alumnae agent data

SCA alumnae memorial agent data

SCA source agent data

SSC agent data JSON mapping

SSC agent data -- person

SSC agent data -- corporate

CONTAINER AND LOCATION RECORD SPECIFICATIONS

SSC Young 219 container data sample + JSON mapping

SCA CLASSIFICATION SPECIFICATIONS

SCA classification data + JSON mapping

SSC SUBJECT SPECIFICATIONS

SSC subject data + JSON mapping

EAD SPECIFICATIONS

Specifications for remediating and processing Smith SC EAD may be found here: sc_ead-specifications. To see information about manual updates, etc., check the spreadsheet. Specific decisions by element are below:

DECISIONS BY ELEMENT

Element Decision Completed Priorities
archdesc @level For most files, @level="collection". Exception: files that start with mnsss292 -- @level="recordgrp". Files that start with mnsss148s -- @level="series".
profiledesc/langusage Replace existing content with "Finding aid written in English." high
lb Enter a space where line break tag previously existed. medium
archdesc/did/unitid Use unitids in resource spreadsheets. high
archdesc/did/unittitle For files with mnsss in filename, replace titles in this field with titles in this document: http://bit.ly/2pAj5dF. Match on EADID. For files with manosca and manoscmr in filename, replace "Records" with "records", "Papers" with "papers", and "Collection" with "collection".
archdesc/extent First extent in EAD: numeric character is entered in number field. If first extent's measurement in EAD = boxes, enter boxes in ASpace type. If first extent's measurement in EAD = linear feet / linear ft., enter "Linear Feet" in A Space type. Second extent in EAD is mapped to container summary.
archdesc/langmaterial Language subelement is entered, where missing in langmaterial element. low
archdesc/bioghist/head Remove head subelement.
archdesc/scopecontent/head Remove head subelement.
archdesc/arrangement/head Remove head subelement.
arcdesc/descgrp Remove descgrp element and retain content inside (accessrestrict, userestrict, processinfo, prefercite, altformavail, acqinfo, accruals, custodhist) high
bioghist/dao Remove completely. Generate report of dao links, content, and the EAD ID from which they were pulled. high
unittitle Remove trailing commas. high
unitdate @ type If there is more than one date or a range of dates in unitdate (e.g. "1897-1902" or "1965, 1967, 1989"), the value in @type should be "inclusive", rather than "single". medium
unitdate Where "c.", "c", "ca", or "ca.", change to "circa". high
unitdate Where "n.d." or "nd", change to "undated". high
unitdate When dates are in parentheses, remove parentheses and retain date information. medium
unitdate When dates are in brackets, remove brackets and retain date information. medium
unitdate Normalize dates in begin and end date fields so that they follow ISO 8601 formatting, e.g. YYYY-MM-DD. high
unitdate Remove duplicate unitdates, created when moving unitdate out of unittitle. high
did/note Change to odd element. high
dsc @type="analyticcover"/c01 Match c01s in analyticover dsc and in-depth dsc by unittitle. Copy over unittitle, extent, unitdate, arrangement, scopecontent from c01s in analyticover dsc to in-depth c01s. Where multiple unittitle, use unittitle from in-depth dsc. Where multiple, matching (as in equal) extent and unitdate, use extent and unitdate in analyticover dsc. Where multiple, non-matching extent and unitdate, keep both for repeatable extent or date. The unitdates we are likely to come across in both the analyticover and in-depth dsc will either be a range or a range + n.d./single date in a single unitdate. It will always be an expression and not a normalized date. Equal means that the expression is identical. high
dsc/c/unittitle Introduce sentence case formatting for series-level components (c01). WontFix low
dsc/c @level For c02-cN (i.e. all but c01), change @level value from "unspecified" to "file". high
dsc/c/did/container For c02-cN (i.e. all but c01), change @label value from "unspecified" to "mixed materials". high
c/did/note Change note subelement to odd and move out of did element. Remove @href or @target and leave content of the note. medium
ref @target Remove all ref subelements but leave content. Ref subelements will be found in dsc/c/odd and archdesc/arrangement. medium
archdesc/index Pull out content from index and save as text file. Retain EADID for later integration. medium
archdesc/odd Pull out appendix content from odd and save as text file. Retain EADID for later integration. medium