Hadoop Jobs Profiler

This Scala program allows for profiling Hadoop jobs, extracting data from properly parsed execution logs. Further, it is possible to exploit the obtained profiles to check if sessions perform as expected.

Code in this repository is licensed under the Apache License, version 2.0.

Usage Note

Alongside the required files output by LogParser, you can provide also several optional files.

appId.txt: a text file with tab-separated lines reporting the class name as first field, followed by a list of job IDs.
appUsers.txt: a text file with tab-separated lines reporting the class name and its number of concurrent users.
dependencies.json: a JSON file mapping profiles to dependency graphs. Dependency graphs should be structured as a map from dependent stages to lists of dependencies.

3ujohn/Profiler

Hadoop Jobs Profiler

Usage Note