/gutendex

Web API for Project Gutenberg ebook metadata

Primary LanguagePythonMIT LicenseMIT

Gutendex

Gutendex is a simple, self-hosted web API for serving book catalog information from Project Gutenberg, an online library of free ebooks.

Try it at gutendex.com.

Why?

Project Gutenberg can be a useful source of literature, but its large size makes it difficult to access and analyse it on a large scale. Thus, an API of its catalog information is useful for automating these tasks.

How does it work?

Gutendex uses Django to download catalog data and serve it in a simple JSON REST API.

Project Gutenberg has no such public API of its own, but it publishes nightly archives of complicated XML files. Gutendex downloads these files, stores their data in a database, and publishes the data in a simpler format.

Installation

See the installation guide.

API

When your server is up and running, you should see a home page that says "Gutendex" at the root URL (e.g. http://localhost:8000 by default when using manage.py to serve on your local machine).

You should run your own server, but you can test queries at gutendex.com.

Lists of Books

Lists of book information in the database are queried using the API at /books (e.g. gutendex.com/books). Book data will be returned in the JSON format

{
  "count": <number>,
  "next": <string or null>,
  "previous": <string or null>,
  "results": <array of Books>
}

where results is an array of 0-32 book objects, next and previous are URLs to the next and previous pages of results, and count in the total number of books for the query on all pages combined.

Books are ordered by popularity, determined by their numbers of downloads from Project Gutenberg.

Parameters can also be added to book-list queries in a typical URL format. For example, to get the first page of written by authors alive after 1899 and published in English or French, you can go to /books?author_year_start=1900&languages=en,fr

You can find available query parameters below.

author_year_start and author_year_end

Use these to find books with at least one author alive in a given range of years. They must have positive or negative integer values. For example, /books?author_year_end=-499 gives books with authors alive before 500 BCE, and /books?author_year_start=1800&author_year_end=1899 gives books with authors alive in the 19th Century.

copyright

Use this to find books with a certain copyright status: true for books with existing copyrights, false for books in the public domain in the USA, or null for books with no available copyright information. These can be combined with commas. For example, /books?copyright=true,false gives books with available copyright information.

ids

Use this to list books with Project Gutenberg ID numbers in a given list of numbers. They must be comma-separated positive integers. For example, /books?ids=11,12,13 gives books with ID numbers 11, 12, and 13.

languages

Use this to find books in any of a list of languages. They must be comma-separated, two-character language codes. For example, /books?languages=en gives books in English, and /books?languages=fr,fi gives books in either French or Finnish or both.

mime_type

Use this to find books with a given MIME type. Gutendex gives every book with a MIME type starting with the value. For example, /books?mime_type=text%2F gives books with types text/html, text/plain; charset=us-ascii, etc.; and /books?mime_type=text%2Fhtml gives books with types text/html, text/html; charset=utf-8, etc.

search

Use this to search author names and book titles with given words. They must be separated by a space (i.e. %20 in URL-encoded format) and are case-insensitive. For example, /books?search=dickens%20great includes Great Expectations by Charles Dickens.

topic

Use this to search for a case-insensitive key-phrase in books' bookshelves or subjects. For example, /books?topic=children gives books on the "Children's Literature" bookshelf, with the subject "Sick children -- Fiction", and so on.

Individual Books

Individual books can be found at /books/<id>, where <id> is the book's Project Gutenberg ID number. Error responses will appear in this format:

{
  "detail": <string of error message>
}

API Objects

Types of JSON objects served by Gutendex are given below.

Author

{
  "birth_year": <number or null>,
  "death_year": <number or null>,
  "name": <string>
}

Book

{
  "id": <number of Project Gutenberg ID>,
  "title": <string>,
  "authors": <array of Authors>,
  "subjects": <array of strings>,
  "bookshelves": <array of strings>,
  "languages": <array of strings>,
  "copyright": <boolean or null>,
  "media_type": <string>,
  "formats": <Format>,
  "download_count": <number>
}

Format

{
  <string of MIME-type>: <string of URL>,
  ...
}