How to get the application running?
1, create cluster
2)upload WordCount.jar to cluster
3)upload data files to one directory (in Cluster SSH)
(this can become a two-level directory. For example, firstLevel is called "inputData". Second level are directories such as Hugo, Histories and Tolstoy)
(type in SSH terminal) tar xfz Hugo.tar.gz
tar xfz Tolstoy.tar.gz
tar xfz shakespeare.tar.gz
hadoop fs -put WordCount.jar /
hadoop fs -put /histories/. /tmp/inputData
hadoop fs -put /comedies/. /tmp/inputData
hadoop fs -put /tragedies/. /tmp/inputData
hadoop fs -put /poetry/. /tmp/inputData
hadoop fs -put /Tolstoy/. /tmp/inputData
hadoop fs -put /Hugo/. /tmp/inputData
type in GCP cluster SSH, to run the program
"hadoop jar WordCount.jar /tmp/inputData/ /tmp/output8/ "
to retrieve output to the google terminal:
"hadoop fs -get /tmp/output8"
"cd output8"
"more part-r-00000"
For the GUI:
Download "input1.txt" and "MyGUI.java"
and just run it in the command prompt
Functions not working:
NOt running in the Docker
frontend GUI application cannot find the wordcount
Functions Working:
GUI application can receive user input
inverted index, for wordcount in each file can work
no duplicates of words in different reducers