This repository serves the purpose of creating a simple scraper which can convert html pages on online-literature.com to markdown files. As it currently stands, the scraper takes a page link as input and creates a markdown file for every chapter found on the provided page.
Simply clone this repository and run the script. Its only external
dependency is the bs4
module. This can be installed using pip or
a similar python module installer:
$ pip install bs4
For basic use, simply run:
$ python3 ol-scraper.py [hyperlink]
For instance, to download the book Democracy: An American Novel by Henry Adams simply run:
$ python3 ol-scraper.py http://www.online-literature.com/henry-adams/democracy-an-american-novel/
Additional options include:
usage: ol-scraper.py [-h] [-o --output-path out-path] [-d --directory-name dir-name]
[-p --path-and-name full-path]
hyperlink
Pull books from online-literature.com
positional arguments:
hyperlink Hyperlink to main page of the book. Like: http://www.online-
literature.com/author/book-name/
options:
-h, --help show this help message and exit
-o --output-path out-path
Path of output directory (default ./)
-d --directory-name dir-name
Name of output directory (default: Title-Of-Book/ (dynamically
dertermined))
-p --path-and-name full-path
Set path and name to output directory at once with one single string
(instead of setting output path and directory name seperately.)
This repository is published under the MIT License.