/whatRequests

script that crawls the what.cd request database and my school library CD catalog and returns matches.

Primary LanguagePython

recursive web crawler that scrapes request titles on what.cd and CD titles at a school library to check for matches.

make sure the mongodb server is running!

to run the what.cd scraper, go into the outer what_crawler directory and run

scrapy crawl what.cd

to run the library scraper, go into the outer library_crawler directory and run

scrapy crawl bobcat

after all of the titles have been scraped, run

mongo localhost:27017/whatRequests --quiet aggregator.js

To change the scraper to run on a different library catalog, library-spider.py will need to modified. Changes will need to be made to allowed_domains, start_urls, and the two XPaths, which can be found using the firebug copy XPath feature or using the XPath chrome extension.

If you're using windows, you will need to download pywin32 because of a bug in Twisted.