Support for Hadoop 3?
theyaa opened this issue · 12 comments
Does Dr. Elephant provide support for Hadoop 3 with Yarn ATS V2 please?
@theyaa, No Dr.Elephant currently doesn't support Hadoop3 with ATS v2. But you can use Dr.E with Hadoop3 in prod given that you Yarn REST APIs and history servers are in sync with what Dr.Elephant is excepting.
Kindly try this if you can and let us know the result and reach out in case you need any help.
Hi @ShubhamGupta29, in HDP3 Hadoop3, all hive queries run using the Tez engine. And Tez is built to send query updates/progress to Yarn ATSv2. Using Yarn timeline server v1 rest api, we can not get Tez query progress information anymore. We have to use Yarn ATSv2. Or read from Hive's sys db tables query_data, dag_data.
@theyaa, got the need for ATSv2. I will have a look at all the needs and changes for this requirement and prioritize respectively.
@ShubhamGupta29 thank you very much. Please let me know when you have a working version so I can download and try it out.
@theyaa Is the Tez UI working in your HDP 3 install?
Can you also provide the value of the property, tez.history.logging.service.class
, which should be present in tez-site.xml.
Thank you.
Hi @shkhrgpt the value is: org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService
@theyaa That may be the issue why the timeline server is not returning data for Tez. org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService
doesn't allow data to go to timeline server and therefore timeline API used Tez fetcher is not working.
Maybe if you change the value of tez.history.logging.service.class
to org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
, it might work. As it's described here:
https://tez.apache.org/tez-ui.html
I haven't tested it yet so I don't know if it causes any problem. But maybe you can try?
Hi @shkhrgpt This will cause issues with Yarn and hive logging since Yarn with Hadoop3 and HDP3 logs to Yarn ATSv2 and the latter uses Protobuf and writes to Hbase. If I switch to the old class for Tez I will loose that logging and cause issues in Yarn. That is why I was asking if there is a way to modify Dr. Elephant to be able to read from Yarn ATSv2.
Okay @theyaa .
Do you know if ATSv2 rest API provides the Tez data which was provided by older ATS rest API?
@theyaa
I wrote a logging service that will write Tez events to both ATSv1 and protobuf. Please check the following if you want to try
https://github.com/shkhrgpt/tez-logging
The goal is that dr elephant should be able to access get data from ATSv1 rest api, and the data should go also be written to protobuf so nothing else.
If you can, then, please try this and let me know if it works for you.
Hi @shkhrgpt Tez+Hive in Hive3 do log all query/dag events to a hive database called sys. Under the sys db, there are 2 tables query_data and dag_data. Those are the main two tables. If you can get Dr. Elephant to read from those two tables, then it will be able to process hive queries the same way as before.
Cloudera has a tools called "Data Analytics Studio" It does exactly this and presents the query in a web user interface. I believe if Dr. Elephant can parse the below 2 tables from hive's sys db, it will be able to perform the same exact way.
-
query_data
-
dag_data