/collections-as-data-notebooks

This repository compiles available Jupyter Notebooks for accessing/analyzing library collections as data

MIT LicenseMIT

Collections-as-Data Notebooks

A work-in-progress curated list of awesome Jupyter notebooks for querying and analyzing library collections as data. Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Collections as data is a movement to mediate library collections in computational formats. Jupyter Notebooks provide an ideal way to introduce data-based explorations of library collections. Much of this work occurs under the umbrella of GLAMLabs.

Inspired by lists like Awesome Jupyter: A Curated List, this repo will pull together Collections as Data Jupyter notebooks available on other Github repos, Binders, etc. For a useful introduction to Jupyter Notebooks for GLAM communities, see Quinn Dombrowski's piece.There was a previous attempt at such a repo, a few years ago: Awesome Jupyter Glam. Relevant resources from the GLAM repo have been added to this list. (Note: Recent changes in Spring 2021 to Library-Carpentry's repo need to be updated here, as well as from the Workbench's new repo).

For an introduction to Jupyter Notebooks for digital methods in GLAM (Galleries, Libraries, and Museums), see the GLAM Workbench. The GLAM Workbench is the most exhaustive collection of Collections as Data Notebooks currently available. Mostly based off of materials in Australia and New Zealand National Libraries, their Github repo is the easiest way to access their notebooks directly. All operational notebooks from the site will be added to this list.

One long-term goal is to convert as many of these as possible to Google Colab notebooks for improving reproducibility.

Table of Contents

Metadata

Text

Textual data is readily available for public domain materials through various digital collections' APIS such as HathiTrust or the National Library of Scotland.

  • National Library of Scotland Text Mining Notebooks - A half dozen notebooks for text mining corpora produced from the National Library of Scotland's collections.
  • HTRC Feature Reader - The HathiTrust Research Center's Feature reader for parsing HathiTrust's collections as extracted features of page-level parts-of-speech and word frequencies across any given HathiTrust collection.

Images

IIIF (International Image Interoerability Framework) standardizes the library curation of digital images, making scripts composed for querying IIIF data relatively easy to adapt to different digital collections

  • Smithsonian IIIF Notebooks - Three separate notebooks for querying IIIF manifests, downloading images, and applying rudimentary facial recognition algorithms.
  • Library of Congress Notebooks - Includes notebooks for quantifying collections and working with IIIF images
  • ContentDM and IIIF API - My notebooks for querying Temple Libraries' digital collections, in particular metadata and IIIF image files from Content DM

Spatial

  • Digital Archaeology - A set of Jupyter Notebooks by Shawn Graham focused on digital archaeology methods, including accessing the Chronicling America API