Bash scripts for several Hadoop tasks
Script | Description |
---|---|
hdfs-download-dir.sh |
Downloads a complete HDFS directory and validates each individual file's size and MD5 for integrity assessment. |
hive-download-partition.sh |
Downloads a specific Hive table partition from HDFS |
day-loop.sh |
Loops over day, and calls a shell script with custom arguments and --date (YYYY-MM-DD) parameter. |
The scripts are assuming GNU tools.
Should work OOTB.
In case you are using Mac OSX, make sure to brew install:
brew install gnu-sed --with-default-names
brew install coreutils
and add the following line to .bash_profile
or .bashrc
or whatever file you define your $PATH
in order to use the GNU utlis with their original name and not prefixed with a g
:
export PATH="$(brew --prefix coreutils)/libexec/gnubin:$PATH"
Not tested.