a multi-threaded spider with a web interface
first, make sure you pip install the requirements:
pip install httplib2
pip install lxml
pip install -e git+https://github.com/coleifer/django-utils.git#egg=djutils
pip install -e git+https://github.com/coleifer/django-spider.git#egg=spider
add djutils
and spider
to your settings file and make sure you run manage.py syncdb
.
add spider.urls
to your root urlconf:
from django.conf import settings
from django.conf.urls.defaults import *
from django.contrib import admin
admin.autodiscover()
urlpatterns = patterns('',
url(r'^admin/', include(admin.site.urls)),
url(r'', include('spider.urls')),
)
make sure the media in the spider app is copied into your static media directory.
start up the task queue:
# assume your cwd is the root dir of virtualenv
export DJANGO_SETTINGS_MODULE=mysite.settings
./bin/python ./src/djutils/djutils/queue/bin/consumer.py start -l ./logs/queue.log -p ./run/queue.pid