twitter/hraven

Update MRJobDescFactory/getSubmitTimeMillisFromJobHistory to check for hadoop2 submit time

Closed this issue · 1 comments

A couple of places need to be updated in the code for setting runId for Map Reduce jobs.

For a Map Reduce job, the run Is is set based on submit time in the config. Currently the submit time conf param that's being checked for is mapred.app.submitted.timestamp. But this does not seem to exist in hadoop2. Will find out the corresponding hadoop2 config param. Also, the function getSubmitTimeMillisFromJobHistory(byte[] jobHistoryRaw) in JobHistoryRawService is outdated. It should be updated to look for the new offset in the 2.0 history file. It should probably be refactored into the hraven-etl module.

Certain map reduce jobs have this param mapred.app.submitted.timestamp set where as some dont. Since submit time is obtained much before the entire job history file is parsed, there needs to be a byte seeking or something else that needs to be done in getSubmitTimeMillisFromJobHistory(byte[] jobHistoryRaw) . So this is not as trivial as simply updating the code for a new job conf param.

merged as part of #71