/AuthorityFiles

program to extract data from MARC authority files

Primary LanguagePythonApache License 2.0Apache-2.0

AuthorityFiles

UPDATES in process

Statement of Purpose

This Python 3 program extracts data from MARC8 authority files and outputs it in a CSV format in order to manipulate the data in a spreadsheet. The program will only extract data from one type of authority file at a time. An optional keyword can be specified to search all fields and output only those records that contain the keyword.

It uses the specialzed Python library pymarc to handle the MARC Format as well as several standard Python libraries.

The current functions cover the extraction of:

A freely available copy of the Library of Congress Subject Headings authority file in MARC8 is available via the MARC Distribution Services.

How to run the program

python subjauth.py <input file path> -type [sh | fd | gd | dg | gf | sj | mp] -o <csv path> [-key <keyword string>]

Options

Option Explanation
-type sh Subject authority records
-type fd Subject authority records for form subdivisions
-type gd Subject authority records for general subdivisions
-type sj Children's Subject authority records
-type gf Genre/Form authority records
-type dg Demographic Group Terms authority records
-type mp Medium of Performance Terms authority records
-o Output location and filename for csv file
-key Authority records Keyword search (phrases in quotes)

The CSV file

The output of the CSV file contains three columns:

  • LCCNs
    • MARC field 010 $a
  • the text of the heading
    • MARC field 1XX $a (with possible additional subfields) for headings in LCSH, LCGFT, LCDGT, CYAC, and LCMPT
    • MARC field 185 $v (with possible additional subfields $v or $x) for LCSH form subdivisions
    • MARC field 185 $x (with possible additional subfields $v or $x) for LCSH general subdivisions
  • scope note, if one exists
    • MARC field 680 ($i and possible $a subfields)

Contributors