CDRH/earlywashingtondc

enslaved.org integration

Opened this issue · 2 comments

We have been asked to contribute data to enslaved.org in the form of a CSV.

Without further information, from their search page I am assuming they are interested in:

  • people
  • events
  • places
  • sources

We could likely line up the OSCYS data as:

  • people
  • events (cases)
  • places (jurisdictions / birth places)
  • sources (documents)

I do not believe all of the information we would want to send them is in Solr, unfortunately, so we would likely need to write a script that got some results from Solr and combined them with personography files and the TTL file.

Documents should be good to go either from Solr or from the TEI files, as they list person ids and case ids.
Cases are something we can get entirely from Solr, as they are aggregated from documents.
People are tricky because there is likely more information in the personography than Solr (need to confirm) and there is also a lot of relationship information by way of our TTL file.

What we need to find out:

  • Are they interested in information about people who are not enslaved? (enslavers, judges, attorneys, etc)
  • Do they want cases and documents?
  • How much relationship information do they want about people?
  • Will this be something that we can update or is this a one-time thing? (this will determine what kind of script I write)
  • When is the deadline?

I have more details about this, I think this is the way forward to start with:

  • build two datasets in csv format, one for "people" and one for "events"
    • people: will be built from the personography, reformatting the fields in that as CSV fields, and separating out multiple values with ";". ook to the PDF linked from this page for help with field definitions: https://docs.enslaved.org/metadata/personMetadata/
      • for ID, I think we should ahve two ID's "local" which is the last part of the URL, and "namespaced" which is the whole URL
    • "events" - there will be one event per case file, which are the tei documents that start with "oscys.caseid". Each case file should be a row in the csv, and follow the metadata defined https://docs.enslaved.org/metadata/eventMetadata/

Write scripts in any form that's useful (ruby, xslt, python) and save them in the scripts folder (https://github.com/CDRH/data_oscys/tree/main/scripts) in an "enslaved.org_scripts" (or something) folder. save outputted files in https://github.com/CDRH/data_oscys/tree/main/output/data_export (we will need to update the readme for that folder and the filenames some)

Based on the request for source information, which is only included in document files, we are likely to need an additional document dataset.