Docker Notes

Please the instructions in the lab section to install docker and run docker containers

After finishing install docker for Ubuntu

Inside of terminal, to show existing docker images

$ docker images 

However, you may run into the following issue
After installing docker for Ubuntu, you may run into docker: Got permission denied issue you can use following command to address the issue The same command should be run to start your docker service every time after restarting your computer.

$ sudo chmod 666 /var/run/docker.sock

Running Docker Containers

To run docker container (First Time)

$ docker run busybox:1.24 # optional echo "hello world"
$ docker run -it --privileged=true \
  --cap-add=SYS_ADMIN \
  -m 8192m -h bootcamp.local \
  --name bigbox -p 2222:22 -p 9530:9530 -p 8888:8888\
  -v /:/mnt/host \
  sunlab/bigbox:latest \

To run interactive docker container

$ docker run -i -t busybox:1.24

To restart docker container (old containers)

first to show existing docker containers

$ docker container ls -a


$ docker ps -a

then restart a specific container by ID or NAME (don't need to use full ID, just first a few symbols)

$ docker start <CONTAINER ID or NAME> 

Then attach it by

$ docker attach <CONTAINER ID or NAME>

How To Remove Docker Containers, Images, Volumes, and Networks

vscode Developing inside a Container

vscode Working with Docker

Run docker image with vscode

Every time you restart your container, you are supposed to start all those services again before any HDFS related operations.

Start all necessary services

# /scripts/



  • install git
$ sudo apt install git
  • git version
$ git --version

vscode with docker extension

  • in terminal
$ code . # to open current directory in vscode
Config git

$ git config --global "Your Name"
$ git config --global ""

Bind Mounting

  • Maps a host file or directory to a container file or directory
  • Basically just two locations pointing to the same file(s)
  • Again, skips UFS, and host files overwrite any in container
  • Can't use in Dockerfile, must be at container run
$ ... run -v /Users/wanli/stuff:/path/container (mac/linus)
$ ... run -v //c/Users/wanli/stuff:/path/container (windows)
docker run -it --privileged=true \
  --cap-add=SYS_ADMIN \
  -m 8192m -h bootcamp.local \
  --name bigbox -p 2222:22 -p 9530:9530 -p 8888:8888\
  -v /:/mnt/host \
  sunlab/bigbox:latest \

Homework 1 Environment

Use the provided .yml file to create a new conda environment

$ conda env create -f environment.yml

Active the conda version

$ source activate homework1

Deactive the conda version

$ source deactivate

HDFS Operations

First, you will need to switch to the hdfs user via

# sudo su - hdfs

Then, you can create a directory and change ownership of the newly created folder

> hdfs dfs -mkdir -p /user/<username> # username is root
> hdfs dfs -chown <username> /user/<username> # username is root
> exit

Similar to creating local directory via linux command mkdir, creating a folder named input in HDFS use

> hdfs dfs -mkdir input


> hdfs dfs -mkdir -p /input/events
> hdfs dfs -mkdir -p /input/mortality


> hdfs dfs -chown -R root /input

to list hdfs files

> hdfs dfs -ls input
> hdfs dfs -ls input/events

Suppose you followed previous instructions and created an directory named input, you can then copy data from local file system to HDFS using -put. For example,

> cd /bigdata-bootcamp/data
> hdfs dfs -put case.csv input
> hdfs dfs -put control.csv input
> hdfs dfs -put /mnt/host/Users/{USERNAME}/path/to/file /input/events
> hdfs dfs -put /mnt/host/home/wanli/cse6250/bigdata4health/homework2/data/events.csv /input/events
> hdfs dfs -put /mnt/host/home/wanli/cse6250/bigdata4health/homework2/data/mortality.csv /input/mortality

to remove files from hdfs

> hdfs dfs -rm -R /path/to/HDFS/file

Run Hive scripts

hive -f sample.hql

Run Pig scripts

Pig tutorials

pig -x local sample.pig

