useful info goes here Requirements * json or simplejson * beautifulsoup verion 3.0 series (it MUST be 3.0 series, not 3.1) http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.0.8.1.tar.gz * solr * sunlightlabs API key Setup: * cp settings.example.py settings.py * create symlinks to settings.py from each of solr/, scraper/ and parser/ * tell solr where to find the schema file. eg, if using running the dev * environment in apache-solr-1.4.1/example/, it will uses schema.xml in the * directory /apache-solr-1.4.1/example/solr/conf. same is true for the * stopwords file. so set up symlinks to he real things, optionally backing up * the originals as .example. cd apache-solr-1.4.1/example/solr/conf mv schema.{,example.}xml mv stopwords.{,example.}txt ln -s /home/cwod/capitolwords/src/solr/schema.xml schema.xml ln -s /home/cwod/capitolwords/src/solr/stopwords.txt stopwords.txt Startup * start up solr. in a dev environment this looks like: cd $SOLR_DIR/example java -jar start.jar (uses jetty)
imclab/Capitol-Words
Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
PythonBSD-3-Clause