/shanework

Working directory created for node js work for shane

Primary LanguageJavaScript

Scrapper

Getting Started

Scrapper is based on Node.js and can easily be configured to scrapped for website by editing configuration file.

Installation

  1. Make sure you have Node.js and NPM installed. See instructions on how to install on Windows.

  2. Clone the repository from GitHub

$ git clone https://github.com/searchodev/shanework scrapper
  1. Go to the directory where you clone the repository and Install the Node modules
$ cd scrapper
$ npm install

Configuration

Two configuration files can be modified under conf directory.

  • db.json which holds the database connection info.
  • scrapper.json used to configure the scrapper using CSS Selector or specified as JSON.

Scheduler Configuration

schedule.json stores the information to automate the running of the scapper at specified schedule.

Schduler uses cron format. The example below will run the fabfurnish every Saturday at 8:05

{"job":"fabfurnish", "schedule": "5 8 * * 6"}

Under conf\sources directory you need to add your sources configuration together with their category mapping.

Running the Application

Run node app.js <source> on your terminal.