Please the instructions in the lab section to install docker and download required dock image.
check existing containers info
$ docker ps -a
To start a container again
$ docker start <CONTAINER ID or NAME>
Then attach it by
$ docker attach <CONTAINER ID or NAME>
Every time you restart your container, you are supposed to start all those services again before any HDFS related operations.\
# /scripts/start-services.sh
Verify Hadoop
$ hadoop version
First, you will need to switch to the hdfs user via
# sudo su - hdfs
Then, you can create a directory and change ownership of the newly created folder
> hdfs dfs -mkdir -p /user/<username> # username is root
> hdfs dfs -chown <username> /user/<username> # username is root
> exit
Similar to creating local directory via linux command mkdir, creating a folder named input in HDFS use
> hdfs dfs -mkdir input
or
> hdfs dfs -mkdir -p /input/events
> hdfs dfs -mkdir -p /input/mortality
and
> hdfs dfs -chown -R root /input
to list hdfs files
> hdfs dfs -ls input
> hdfs dfs -ls input/events
Suppose you followed previous instructions and created an directory named input, you can then copy data from local file system to HDFS using -put. For example,
> cd /bigdata-bootcamp/data
> hdfs dfs -put case.csv input
> hdfs dfs -put control.csv input
> hdfs dfs -put /mnt/host/Users/{USERNAME}/path/to/file /input/events
> hdfs dfs -put /mnt/host/home/wanli/cse6250/bigdata4health/homework2/data/events.csv /input/events
> hdfs dfs -put /mnt/host/home/wanli/cse6250/bigdata4health/homework2/data/mortality.csv /input/mortality
to remove files from hdfs
> hdfs dfs -rm -R /path/to/HDFS/file
> hdfs dfs -mkdir -p /hw2
> hdfs dfs -put /mnt/host/home/wanli/cse6250/bigdata4health/homework2/code/pig/training/ /hw2
> hdfs dfs -put /mnt/host/home/wanli/cse6250/bigdata4health/homework2/code/lr/ /hw2
> hdfs dfs -ls /hw2
> hdfs dfs -chown -R root /hw2
outside of hdfs system
# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D mapreduce.job.reduces=5 -files lr -mapper "python lr/mapper.py -n 5 -r 0.4 " -reducer "python lr/reducer.py -f 3618" -input /hw2/training -output /hw2/models
copy model to local
# hdfs dfs -get /hw2/models
#cat pig/testing/* | python lr/testensemble.py -m models/
hive -f sample.hql
pig -x local sample.pig