r-three/common-pile

Hathi Trust

Closed this issue · 1 comments

https://www.hathitrust.org/member-libraries/resources-for-librarians/data-resources/research-datasets/

HathiTrust makes public domain work available for bulk download on request for non-commercial research purposes.

There are 4 distinctions based on where the researcher is located and if volumes digitized by Google is included.

image

Public domain text, excluding Google-digitized volumes

  1. Dataset for researchers in the U.S. -> 814,045 public domain and Creative Commons-licensed (480GB)
  2. Dataset for researchers outside the U.S. -> 610,575 (351GB)

All public domain text, including Google-digitized volumes

  1. Researchers in the U.S. -> 6,649,535 public domain and Creative Commons-licensed volumes (5.4TB)
  2. Researchers outside the U.S. -> 4,316,648 (3.4TB)

I think the version that includes Google-digitized volumes is a superset of #18 .

As discussed on today's call, even for PD books HathiTrust requires us to agree to terms that we do not wish to agree to. We can try to negotiate with them, but for now this is a no.