Cat-facts as a Service! This project provides a RESTful API to query a MongoDB database for cat-facts scraped across the web.
See the hosted site here, or index.html
in the django/caas_app/templates/
directory.
CaaS is written in Python 2, Django, and assumes a MongoDB backend. It depends on the following:
django
pymongo
version 2.8.mongoengine
version 0.9.0.
phantomjs
(A simpleapt-get install phantomjs
on Ubuntu, otherwise check for your distro)selenium
Install them all at once:
pip install pymongo==2.8 mongoengine==0.9.0 django selenium
- Install MongoDB (if not done already) and add a new
caas
database from the MongoDB shell:use caas
- Copy
django/db_auth.template.json
todjango/db_auth.json
, and edit theusername
,password
, and any other required fields to match your database settings. Do the same in thedata/
directory. - Run
data/db-insert.py
againstdata/db_auth.json
and all.json
files in thedata/
directory to populate your database. - Change the value of
AUTHFILE_LOCATION
indjango/caas/settings/settings.py
to match the absolute path ofdb_auth.json
in yourdjango/
directory. - Hook up the
django/caas_app
Django application to the web server of your choice, or usepython manage.py startserver <ip>:<port>
to use the built-in Django web server to run the app.
Scrapers are used to scrape specific sources for cat-facts and output JSON files, ready to be inserted into the database. They are written in Python 2 (but compatible with Python 3) and can be found in the scrapers/
directory. To be compatible with the data/db-insert.py
insertion script, output filenames should be prefixed by <coll_name>_
, where coll_name
is one of the target collections detailed below (without the db.
prefix).
The database has six collections: catfact, meta, intro, newsub, unsub, and notrecog.
This collection contains the actual text of the cat-fact.
_id
The MD5 hash of the cat-fact text, truncated to 24 characters.
text
The text cat-fact.
This collection contains the metadata of the cat-fact.
_id
The MD5 hash of the cat-fact text, truncated to 24 characters.
source
The human-readable source of the cat-fact. (e.g. "Steakscorp Labs")
url
The specific URL where the cat-fact text was scraped.
This collection contains intro text (see API) to be inserted before the response text if intro=yes
was specified in the API query.
text
The intro text to be inserted before the response text. The actual text included in the response will be chosen randomly from this collection.
This collection contains new subscription text (see API) to be inserted before the response text if newsub=yes
was specified in the API query.
text
The new subscription text to be inserted before the response text. The actual text included in the response will be chosen randomly from this collection.
This collection contains unsubscription text (see API) to be inserted after the response text if unsub=yes
was specified in the API query.
text
The unsubscription text to be inserted before the response text. The actual text included in the response will be chosen randomly from this collection.
This collection contains "command not recognized" error messages (see API) to be inserted before the response text if notrecog=yes
was specified in the API query.
text
The "command not recognized" text to be inserted before the response text. The actual text included in the response will be chosen randomly from this collection.