/timberlake

Timberlake is a Job Tracker for Hadoop.

Primary LanguageGoMIT LicenseMIT

Timberlake is a Job Tracker for Hadoop.

Intro

Timberlake is a Go server paired with a React.js frontend. It improves on existing Hadoop job trackers by providing a lightweight realtime view of your running and finished MapReduce jobs. Timberlake exposes the counters and configuration that are the most useful, allowing you to get a quick overview of the whole cluster or dig into the performance and behavior of a single job.

It also provides waterfall and boxplot visualizations for jobs. We've found that these visualizations can be really helpful for figuring out why a job is slow. Is it launching too many mappers and overloading the cluster? Are reducers launching early and starving the mappers? Does the job have reducer skew? You can use the counters of bytes written, shuffled, and read to understand the network and I/O behavior of your jobs. And when there's a crash, Timberlake will show you tracebacks from the logs to help you debug the job.

Timberlake pairs well with Scalding and Cascading. It uses extra data from the Cascading planner to show the relationships between steps, and to clarify which jobs' outputs are used as inputs to other jobs in the flow. Visualizing that flow makes it much easier to figure out which steps are causing bottlenecks.

Finally, we've included a Slackbot that has significantly improved our Hadooping lives. The bot can notify you when your jobs start and finish, and provides links back to Timberlake.

Screenshots

Job Details

Job Details

List of Jobs

List of Jobs

Installation

The best way to install is with tarballs, which are available on the release page.

Download it somewhere on your server, and then untar it:

$ tar zxvf timberlake-v1.0.2-linux-amd64.tar.gz
$ mv -T timberlake-v1.0.2-linux-amd64 /opt/timberlake

Now you can start the server:

$ /opt/timberlake/bin/timberlake \
    --bind :8000 \
    --resource-manager-url http://resourcemanager:8088 \
    --history-server-url http://resourcemanager:19888 \
    --namenode-address namenode:9000

And optionally, start the Slackbot:

$ /opt/timberlake/bin/slack \
    --internal-timberlake-url http://localhost:8000 \
    --external-timberlake-url https://timberlake.example.com \
    --slack-url https://hooks.slack.com/services/...

You'll need to create a new Incoming Webhook to generate the Slack URL for your bot.

Building from Source

You'll need npm, go and node on your path.

$ go get -u golang.org/x/lint/golint \
    github.com/colinmarc/hdfs \
    github.com/zenazn/goji \
    github.com/stretchr/testify

$ git clone https://github.com/stripe/timberlake.git
$ cd timberlake
$ make

Limitations

Timberlake only works with the YARN Resource Manager API. It's been tested on v2.4.x and v2.5.x, but the Kill Job feature uses an endpoint that's only available in v2.5.x+.

Our cluster has 10-40 jobs running simultaneously and about 2,000 jobs running per day. Timberlake's performance has not been tested outside these bounds.