/PyGallica

A Python wrapper for the National Library of France's Gallica API.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

PyGallica

A Python wrapper for the National Library of France's Gallica API. The package contains basic classes and associated methods for querying the Search API, Gallica IIIF API, and the Document API. No API keys are required.

The folder pygallica_api contains the original, Python 2 version of the wrapper, and the folder python3 contains the Python 3 version. A Jupyter notebook walking through the wrapper's basic functionality (in Python 3) is available in the jupyter folder.

Getting Started

Download the project above, then unzip the file. In your terminal, navigate to the project folder and install the requirements:

pip install -r requirements.txt

Then launch Python to start using the package.

Search API

The Search API allows you to perform keyword searches in Gallica's holdings and retrieve xml returned by those searches.

Example usage:

>>> from search_api import Search
>>> Search.search('your', 'keywords')

This will return the xml associated with your search. The xml file will be saved locally for easy parsing.

IIIF API

The IIIF API allows you to retrieve images from Gallica's holdings, as well as the .json metadata associated with those images. Gallica, as a participant in the IIIF, offers access to all of the more than 100 million images in its Gallica digital library.

The API takes an Ark ID, region, size, rotation, quality, and format as arguments.

Example usage:

>>> from iiif_api import IIIF
>>> IIIF.iiif('12148/btv1b90017179/f15', '0,1900,2400,1200', 'full', '0', 'native', 'jpg')

This will save your image in a new folder. To retrieve the metadata for an image, simply input an Ark ID:

>>> from iiif_api import IIIF
>>> IIIF.metadata('12148/btv1b90017179/f15')

Document API

The Document API allows you to retrieve metadata about a particular document in Gallica's holdings. There are a number of different methods for retrieving various types of metadata.

Example usage, retrieving OCR from page 10 of the document whose Ark ID is passed:

>>> from document_api import Document
>>> Document.ocr('bpt6k5619759j', '10')