/digitale_Romanistik

Primary LanguageJupyter NotebookCreative Commons Zero v1.0 UniversalCC0-1.0

Getting OCR-ed books from https://gallica.bnf.fr

The whole digital collection for various centuries is at https://gallica.bnf.fr/html/und/litteratures/les-classiques-de-la-litterature-acces-par-periode?mode=desktop. Our focus is on the following collections:

Our task is to download all OCR-ed books from that collection for three centuries. The problem is that the APIs (https://api.bnf.fr/api-gallica-de-recherche#/recherche and https://api.bnf.fr/fr/api-document-de-gallica#/texte%20brut/get__ark__texteBrut) do not provide search over collections.