This is a set of python scripts which downloads all Dutch ebooks from Project Gutenberg, renames them to human-readabele filenames, formats them so they display well on my ebook reader, and tosses them into subdirectories for easier navigation. Written by Michiel Overtoom, motoom@xs4all.nl How to use: - Run bulkdownload.py to download the raw texts from a mirror of Project Gutenberg's eBook archive. - Run gutenberg.py to reformat and rename the raw texts. - Run toss.py to distribute them over subdirectories. After that, upload them to your eBook reader, and enjoy! In March 2016 I reworked this program since it's no longer allowed to scrape from Gutenberg's main web site. This newer version: - downloads from a mirror instead of scraping from Gutenberg's main web site - language can be specified - better input encoding detection - outputs UTF8 encoded text files
motoom/gutenberg-ebook-scraping
Download, convert and organize Gutenberg books for eBook Readers
Python