/hadoosh

Hadoop Interactive Shell

Primary LanguageJava

HadooSh

Ever wanted to grep a file on HDFS and store the results locally? Now you can, with one simple command in HadooSh:

cbeavers > cd books/
books > cat Ulysses-part-00000 | grep Mulligan >l localOut.txt

HadooSh is an interactive shell for HDFS built on top of JLine to offer tab completion of commands and paths under Hadoop. It supports piping to local system commands, and both local and remote output. There are many bugs to be found, so please play nicely with it.

Currently supported operations:

  • cd [dir]

  • pwd

  • head [numLines]

  • all FsShell commands (hadoop fs), such as

    • cat [files]
    • mv <src> <dst>
    • rm [files]
  • avrocat (prints the first ten records from avro file)

  • local (to execute one of the above commands on the local FS)

  • support for piping to local commands

  • use ">" to run command output to HDFS filesystem

  • use ">l" to run command output to local filesystem

  • all JobClient commands (hadoop job) are accessible to by typing "job" after the prompt first

  • runjar <localjar> ... (hadoop jar)

  • tlog (-job <jobid> |-dir <jobOutputDir>) [-taskpattern taskglob] [-hostpattern hostglob]

     # Show all logs for a job:
     tlog -job job_201306131712_0004
    
     # Show all mapper logs for a teragen job:
     tlog -dir /user/gera/tgen -taskpattern *_m_*
    
     # grep logs for job tasks run on certain nodes
     tlog -job job_201306131712_0004 -hostpattern *.rack.company.com | grep needle

Planned future actions:

  • should probably put some limits on file sizes

Known bugs:

  • piped commands that contain quotes such as
cat file | cut -d" " -f 1

Build HadooSh:

git clone https://github.com/gerashegalov/hadoosh.git
cd hadoosh
mvn package

This builds HadooSh against Apache dependencies

You can also use build HadooSh against MapR artifacts:

mvn package -Pmapr

Executing HadooSh:

To use HadooSh, just copy the included jar to your Hadoop cluster's gateway, make sure you've kinit'd if necessary, and run the following:

wget https://github.com/cbeav/hadoosh/raw/master/HadooSh.jar
hadoop jar HadooSh.jar HadooSh

To run HadooSh with Maven:

# symlink hadoop conf
ln -s ${HADOOP_HOME}/conf
mvn clean compile exec:exec -Dexec.executable=java -Dexec.args="-classpath %classpath HadooSh"

Enjoy.

Authors: Chris Beavers, Paul Hobbs, and Gera Shegalov