/LoC-reconcile

Library of Congress Reconciliation Service for OpenRefine (LCNAF, LCSH)

Primary LanguagePythonOtherNOASSERTION

Library of Congress Reconciliation Service for OpenRefine

The following is a web service that interacts with the OpenRefine Reconciliation Service API to reconcile names from the Library of Congress Name Authority File (LCNAF) and subjects from the Library of Congress Subject Headings (LCSH).

How does it work?

This service attempts to fetch names and subjects from the Library of Congress using the following methods sequentially:

The reconciliation score, which indicates how good the match is, is determined using the Python difflib library.

Installation

  • Ensure Python 3 is installed. This program was developed with Python 3.4.3.
  • Download this repository locally (git clone or .zip)
  • Navigate to your local copy of the program in the command line interface
  • Install the program requirements by typing python -m pip install -r requirements.txt
  • Start the program by typing python LoCreconcile.py (or run in IDLE or another IDE)

Usage in OpenRefine

  • Click the arrow in the title column of the column of names and/or subjects you wish to reconcile.
  • Click Reconcile > Start reconciling...
  • Click the Add Standard Service... button in the bottom left of the reconciliation menu
  • Under Enter the service's URL, enter the URL http://127.0.0.1:5000/reconcile/LoC
  • Note that after the service is added once per the previous steps, you will simply be able to select "LC Reconciliation Service" from the reconciliation menu in the future.
  • In the following menu, Names reconciles from LCNAF, Subjects reconciles from LCSH, and LoC reconciles from both.
  • Having Auto-match candidates with high confidence selected will automatically reconcile perfect matches
  • If you do not quantify Maximum number of candidates to return, the program will attempt to return up to 3 candidate matches for each name/subject.
  • Click Start Reconciling

Interpreting the Results

The results of reconciliation will be links to URIs of the best matching names and subjects the service could find.

Example: http://id.loc.gov/authorities/names/n85243950

One of the best ways to expedite the reconciliation process is to start by exploring names which were near-perfect matches, having reconciliation scores of .80+ first, using the best candidate's score facet, continuing to decrement the score range until the matches no longer seem correct. Consult the OpenRefine wiki pages on reconciliation and the Reconciliation Service API for more information. You can also search the web for guides, such as this one.

Other reconciliation services: