Scrapper is based on Node.js and can easily be configured to scrapped for website by editing configuration file.
-
Make sure you have Node.js and NPM installed. See instructions on how to install on Windows.
-
Clone the repository from GitHub
$ git clone https://github.com/searchodev/shanework scrapper
- Go to the directory where you clone the repository and Install the Node modules
$ cd scrapper
$ npm install
Two configuration files can be modified under conf
directory.
db.json
which holds the database connection info.scrapper.json
used to configure the scrapper using CSS Selector or specified as JSON.
schedule.json
stores the information to automate the running of the scapper at specified schedule.
Schduler uses cron format. The example below will run the fabfurnish every Saturday at 8:05
{"job":"fabfurnish", "schedule": "5 8 * * 6"}
Under conf\sources
directory you need to add your sources configuration together with their category mapping.
Run node app.js <source>
on your terminal.