/scrapi

Scrapers + api

Primary LanguagePython

scrapi

Build Status

Getting started

  • You will need to:
    • Install requirements.
    • Install Elasticsearch
    • Install consumers
    • Install rabbitmq

Requirements

  • Create and enter virtual environment for scrapi, and go to the top level project directory. From there, run
$ pip install -r requirements.txt

and the python requirements for the project will download and install.

Installing Elasticsearch

note: JDK 7 must be installed for elasticsearch to run

Mac OSX

$ brew install elasticsearch

Now, just run

$ elasticsearch

or

$ invoke elasticsearch

and you should be good to go.

Ubuntu

$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb 
$ sudo dpkg -i elasticsearch-1.2.1.deb

Now, just run

$ sudo service elasticsearch start

or

$ invoke elasticsearch

and you should be good to go.

Running the server

  • Just run
$ python main.py

from the scrapi/website/ directory, and the server should be up and running!

Consumers

  • Just run
$ invoke install_consumers

and the consumers specified in the manifest files of the worker_manager, and their requirements, will be installed.

Rabbitmq

Mac OSX

$ brew install rabbitmq

Ubuntu

$ sudo apt-get install rabbitmq-server

Running the scheduler

  • from the top-level project directory run:
$ invoke celery_beat

to start the scheduler, and

$ invoke celery_worker

to start the worker.

Testing

  • To run the tests for the project, just type
$ invoke test

and all of the tests in the 'tests/' directory will be run.