ijspider

simple spider for accessing foaf xml documents across journaling sites.

this spider is really experimental, and does not behave itself very well. It uses the standard python threading library. although I have tried to get it to behave and respond to keyboard interrupts, sometimes it won't, so pgrep py | xargs kill -9 may be required.

motivation

FOAF is an xml (i.e. machine readable) format describing links between people popular on journaling sites. I want to mine this data in order to see information about links between users.

todo

store data in some cache/sqlite3 db for offline processing
get threads to listen to some event for shutdown
improve performance, fix queue locking get/put mess
investigate zeromq, multiprocessing and similar

example

py ijspider.py <username> | egrep --line-buffered --color "<sha1hash>|$"

jonnu/ijspider

ijspider

motivation

todo

example