- To run absolutely everyting, you will need to:
- Install requirements.
- Install Elasticsearch
- Install Cassandra
- Install harvesters
- Install rabbitmq (optional)
- To only run harvesters locally, you do not have to install rabbitmq
- Create and enter virtual environment for scrapi, and go to the top level project directory. From there, run
$ pip install -r requirements.txt
and the python requirements for the project will download and install.
note: JDK 7 must be installed for Cassandra and Elasticsearch to run
$ brew install cassandra
$ brew install elasticsearch
-
Check which version of Java is installed by running the following command:
$ java -version
Use the latest version of Oracle Java 7 on all nodes.
-
Add the DataStax Community repository to the /etc/apt/sources.list.d/cassandra.sources.list
$ echo "deb http://debian.datastax.com/community stable main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
-
Add the DataStax repository key to your aptitude trusted keys.
$ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -
-
Install the package.
$ sudo apt-get update $ sudo apt-get install dsc20=2.0.11-1 cassandra=2.0.11
-
Download and install the Public Signing Key.
$ wget -qO - https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
-
Add the ElasticSearch repository to yout /etc/apt/sources.list.
$ sudo add-apt-repository "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main"
-
Install the package
$ sudo apt-get update $ sudo apt-get install elasticsearch
Now, just run
$ cassandra
$ elasticsearch
Or, if you'd like your cassandra session to be bound to your current session, run:
$ cassandra -f
and you should be good to go.
(Note, if you're developing locally, you do not have to run Rabbitmq!)
$ brew install rabbitmq
$ sudo apt-get install rabbitmq-server
You will need to have a local copy of the settings
cp scrapi/settings/local-dist.py scrapi/settings/local.py
(Note: only needed if NOT running locally!)
- from the top-level project directory run:
$ invoke beat
to start the scheduler, and
$ invoke worker
to start the worker.
Run all harvesters with
$ invoke harvesters
or, just one with
$ invoke harvester harvester-name
Invove a harvester a certain number of days back with the --days
argument. For example, to run a harvester 5 days in the past, run:
$ invoke harvester harvester-name --days=5
- To run the tests for the project, just type
$ invoke test
and all of the tests in the 'tests/' directory will be run.