This project started at Code the City 19 History and Data event.

It's purpose is to gather data on Aberdeen-built ships, with the permission of the site's owners, and to push that bulk of data onto Wikidata as open data, with links back to the Aberdeen Ships site through using a new identifier.

Progress to date

updated 05 Jun 2020

So far the following has been accomplished.

CTC20 1-2 Aug 2020

Some core data was imported into wikidata for most of the ships, excluded some from the import as the name field was blank or UNKNOWN or UNNAMED.

Was initially trying to use the CSV format for wikidata quickstatements, but couldn't get this to work so switched to the TSV version. A python script was written to write the quickstatements file that could then be copied into the quickstatements batch import tool. The import had 2 errors for ships that had a range of years in the Date so generated invalid dates in the quickstatements. These (and 2 duplicates that I noticed after the import) are noted to correct later.

The ABS ID property (P8260) was manually added to the ships that already existed in wikidata.

The mappings between QID and ABS ID was found from SPARQL query:

SELECT ?qid ?absid
WHERE
{
  ?qid wdt:P8260 ?absid.
}

Next Steps?

To complete the project the following needs to be done

  • Rationalise all ship builders that exist in ship_builders.csv - deduplicating these and create Wikidata entries for each we will use.
  • Rationalise all ship types that exist in ship_types.csv - deduplicating these and create Wikidata entries for each we will use.
  • Extract/rationalise data from some of the fields, e.g. we have one dimensions field rather than separate fields for length/beam/draft/... and what's there is inconsistent
  • Isolate ships that have no Wikidata identifier - i.e. any one not in the list of 59 positive matches. Set aside those which have entries for later processing.
  • Decide on best route to bulk upload - eg Quickstatements. This may be useful: Wikidata Import Guide
  • Agree a core set of data for each ship that will parsed from ships.json to be added to Wikidata. See Wikidata Ship Properties below.
  • Create a script to output text that can be dropped into a CSV or other file to be used by QuickStatements (assuming that to be the right tool) for bulk input ensuring links for shipbuilder IDs and ABS identifiers are used.
  • deal with adding data to existing 59 wikidata entries
  • Develop a means of monitoring both the original ABS system (rescrape periodically and do a diff on the file in some way? ) and monitor Wikidata for changes to the ships records (Wikidata query, executed periodically, generating a CSV download and checked for differences from previous runs?) to feed back to ABS.

Wikidata Ship Properties

The following have been identified as potential Wikidata statements that we need to consider using. Not all ships will have all data available. Core ones have (*) after them.

  1. Label (*)
  2. Description (*)
  3. Instance of (P31) (*)
  • Ship or if available a subclass such as
  • Schooner
  • Clipper
  • Whaler
  • Brig etc. Note - this can be multiples.
  1. Name (P2561) - or official name (P1448)?? - (*) could have multiple values with dates for start + end
  2. Abedeen Built Ships ID (P8260) (*)
  3. Significant event (P793) Include possible such as order (Q566889), keel laying (Q14592615), ceremonial ship launching (Q596643), ship decomissioning (Q7497952), shipwrecking (Q906512), but also sea voyage etc. each with point in time (P585). Voyages could have destination and start and end dates. Also destruction, breaking up etc.
  4. Cost (P2130)
  5. Mass (P2067)
  6. Gross Tonnage (P1093)
  7. Length (P2043)
  8. Beam (P2261)
  9. Draft (P2262)
  10. Number of masts (P1099)
  11. Speed (P2052)
  12. Manufacturer (P176) - take values from table of Ship builders
  13. location of creation (P1071) - Aberdeen (Q36405) (*)
  14. Country of origin (P495) - GB 1701-1801, UK GBNI 1801-1927, UK (1927-) (*)
  15. Service entry (P729)
  16. Service Retirement (P730)
  17. Described at URL (P973) with a link to ABS (maybe not given we'll have specific ABS ID)
  18. Country of Registry (P8047) - could have numerous values / dates
  19. Home port (P504) - could have numerous values / dates