AWS (EMR) Hive-Tez triggered from Oozie are not analyzed
labbedaine opened this issue · 3 comments
Hi.
I am new to Dr. Elephant and I would love to make it a permanent tool in our solution. For info, I have requested to join the Google Groups (https://groups.google.com/forum/#!forum/dr-elephant-users) but my membership is still pending, that is the reason why I am opening an issue.
Our company is using EMR from AWS for processing large volume of data. I discovered Dr. Elephant here https://aws.amazon.com/blogs/big-data/tune-hadoop-and-spark-performance-with-dr-elephant-and-sparklens-on-amazon-emr/
I followed the step by step guide, and it worked, Spark & Hive jobs are analyzed (please note that all jobs were triggered manually). I am not installing Dr. Elephant as a permanent solution on all our clusters.
After a first successful run, only the Spark jobs are picked up by tool, none of the Hive-Tez queries are showing up. The difference between the tests I made and the cluster is that Hive queries are being triggered by Oozie. If I run some queries using Hive-CLI it works.
Here's the fetchers that are enabled.
<fetcher>
<applicationtype>tez</applicationtype>
<classname>com.linkedin.drelephant.tez.fetchers.TezFetcher</classname>
</fetcher>
<fetcher>
<applicationtype>mapreduce</applicationtype>
<classname>com.linkedin.drelephant.mapreduce.fetchers.MapReduceFSFetcherHadoop2</classname>
<params>
<sampling_enabled>false</sampling_enabled>
<history_log_size_limit_in_mb>5000</history_log_size_limit_in_mb>
<history_server_time_zone>UTC</history_server_time_zone>
</params>
</fetcher>
<fetcher>
<applicationtype>spark</applicationtype>
<classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
<params>
<use_rest_for_eventlogs>true</use_rest_for_eventlogs>
<should_process_logs_locally>true</should_process_logs_locally>
</params>
</fetcher>
I would appreciate some help since I am really excited to use Dr. Elephant.
Thank you.
After digging a bit more, I noticed the Oozie Web Console is not available by default in an AWS/emr installation (Port 11000). I found the required steps in order to enable the UI and I am testing again, but so far no luck.
I found the culprit:
<oozie_api_url>http://localhost:11000/oozie</oozie_api_url>
Can't use localhost, must be the IP of the master node.
@labbedaine hope thee issue is resolved, closing the ticket.