simple spider for accessing foaf xml documents across journaling sites.
this spider is really experimental, and does not behave itself very well. It uses the standard
python threading library. although I have tried to get it to behave and respond to keyboard
interrupts, sometimes it won't, so pgrep py | xargs kill -9
may be required.
FOAF is an xml (i.e. machine readable) format describing links between people popular on journaling sites. I want to mine this data in order to see information about links between users.
- store data in some cache/sqlite3 db for offline processing
- get threads to listen to some event for shutdown
- improve performance, fix queue locking get/put mess
- investigate zeromq, multiprocessing and similar
py ijspider.py <username> | egrep --line-buffered --color "<sha1hash>|$"