/Archive.org-Downloader

Python3 script to download archive.org books in PDF format

Primary LanguagePython

made-with-python


Logo

Archive.org-Downloader

Python3 script to download archive.org books in PDF format !

About The Project

There are many great books available on https://openlibrary.org/ and https://archive.org/, however, many of them need to be borrowed and you can only borrow them for 1 hour to 14 days and you don't have the option to download it as a PDF to read it offline or share it with your friends. I created this program to solve this problem and retrieve the original book in pdf format for FREE !

Of course, the download takes a few minutes depending on the number of pages and the quality of the images you have selected. You must also create an account on https://archive.org/ for the script to work.

Getting Started

To get started you need to have python3 installed. If it is not the case you can download it here : https://www.python.org/downloads/

Installation

Make sure you've already git installed. Then you can run the following commands to get the scripts on your computer:

git clone https://github.com/MiniGlome/Archive.org-Downloader.git
cd Archive.org-Downloader

The script requires the modules requests, tqdm and img2pdf, you can install them all at once with this command:

pip install -r requirements.txt

Usage

usage: archive-org-downloader.py [-h] [-e EMAIL] [-p PASSWORD] [-s SKIP_LOGIN]
                                 [-u URL] [-d DIR] [-f FILE] [-r RESOLUTION]
                                 [-t THREADS] [-j] [-l LOAN_ALWAYS]

optional arguments:
  -h, --help            show this help message and exit
  -e EMAIL, --email EMAIL
                        Your archive.org email
  -p PASSWORD, --password PASSWORD
                        Your archive.org password
  -s SKIP_LOGIN, --skip_login SKIP_LOGIN
                        Use this if you want to download books which do not
                        need to be borrowed.
  -u URL, --url URL     Link to the book (https://archive.org/details/XXXX).
                        You can use this argument several times to download
                        multiple books
  -d DIR, --dir DIR     Output directory
  -f FILE, --file FILE  File where are stored the URLs of the books to
                        download
  -r RESOLUTION, --resolution RESOLUTION
                        Image resolution (10 to 0, 0 is the highest), [default
                        3]
  -t THREADS, --threads THREADS
                        Maximum number of threads, [default 50]
  -j, --jpg             Output to individual JPG's rather than a PDF
  -l LOAN_ALWAYS, --loan_always LOAN_ALWAYS
                        Always try to loan all added books. Can improve speed
                        if you're planning to download only books that need to
                        be loaned.

The email and password fields are required if the books you want to download need to be borrowed. The -r argument specifies the resolution of the images (0 is the best quality). The PDF are saved in the current folder

Example

This command will download the 3 books as pdf in the best possible quality. To only downlaod the individual images you can use --jpg.

python3 archive-org-downloader.py -e myemail@tempmail.com -p Passw0rd -r 0 -u https://archive.org/details/sim_interview_1998-11_28_11 -u https://archive.org/details/sim_interview_1998-05_28_5 -u https://archive.org/details/sim_interview_1997-11_27_11

This command will allow you to download books which do not need to be borrowed without owning an archive.org account:

python3 archive-org-downloader.py -s true -r 0 -u https://archive.org/details/IntermediatePython -u https://archive.org/details/jstor-4560629 -u https://archive.org/details/jstor-41455353

If you want to download a lot of books in a raw you can paste the urls of the books in a .txt file (one per line) and use --file

python3 archive-org-downloader.py -e myemail@tempmail.com -p Passw0rd --file books_to_download.txt

Donation

If you want to support my work, you can send 2 or 3 Bitcoins 🙃 to this address:

bc1q4nq8tjuezssy74d5amnrrq6ljvu7hd3l880m7l

bitcoin_address