Fuge is a weekend project written while trying to dig deep into Java threads, synchronization and communication using ConcurrentLinkedQueue
.
Fuges creates 1 job dispatcher thread, 1 job result aggregator, multiple job consumer threads. Fuge use 2 thread-safe queues jobQueue
and resultQueue
for sending job and their results back and forth. Here is how it all gets connected:
dispatcher
thread sends new job overjobQueue
Multipleconsumer
threads listen for incoming jobs onjobQueue
. A consumer accepts 1 job at a time and process itresultQueue
over whichconsumer
threads send job result.aggregator
thread flushes incoming results overresultQueue
.
Crawler is a web crawler writtern on to of Fuge. Crawler setup Fuge and seeds an input url. Input job (url) is received by one of the consumer who result back a Crawler result object containing information about crawled URL. Links found on the job url are feeded back into the Fuge system as input. This goes on endlessly crawling possibly the entire web.
git clone git@github.com:abhinavsingh/fuge.git
cd fuge
mvn clean compile assembly:single
java -jar target/Crawler-jar-with-dependencies.jar http://abhinavsingh.com