hadoop_filecrawler

  1. Run the hadoop server on localhost
  2. Unzip the folder threaded_crawler.zip

Obtain the jar file at this link. https://app.box.com/s/k37anoksz0bjwg5mnfiqjpyabnagft47.

  1. use the command 'java threaded_crawler.GUI' to run the program
  2. the program will wait for the input of path which needs to be indexed. All the files in that path will be index and ranked
  3. Use the GUI.

Note: The hadoop server configurations are kept as defualt i.e. the server runs at port 9000, localhost