/lagout

Get all the free e-book resources from the online library "doc.lagout.org"

Primary LanguagePython

How to use this project

Prerequisites

  1. You need to install Python 3.x on your computer. Download the latest Python here.

  2. Install python-pip tool following the instructions here.

  3. Install the scrapy module with pip by running:

    pip install scrapy on Windows, or:

    python3 -m pip install --user scrapy on linux.

Quick Demonstration

If in case you have already done the scrapy preparation stuff, all that you need is clone the repository to your local disk, and type these commands on your cmd or bash console:

cd lagout
scrapy crawl lagout

where lagout is the task name of the crawl project.

Take a nap, or a cup of coffee, and enjoy above 60k free e-books. I leave the downloading jobs to yourself. You can just build a mirror site from the original one, or just download the resources by your needs, it's all up to you.

Detailed Description

  1. To start a new scrapy project, just run:

    scrapy startproject lagout

  2. Now we have a scrapy project called "lagout" which is built from the scrapy built-in templates, what we need to do is create a new file in the lagout/lagout/spiders directory, let's say "lagout_spider.py". In this file, write the crawler process according to your customized purposes.

  3. For further knowledge, the latest scrapy documentation is here, and a handy scrapy tutorial is here.