linkedin/dr-elephant

Tracking peak memory usage

itamarst opened this issue · 3 comments

Hi,

I am helping out a company that would like to use Dr. Elephant, and in particular would like to be tracking peak memory execution stats and getting recommendations based on that (should more/less memory be allocated). They're using Spark 2.3, and concluded that it's not possible to track that specific thing with current Dr. Elephant.

Since I understand you're in the process of updating it, so was curious how the process will go and how it might interact with external contributions:

  1. Do you have a timeline, e.g. when newer Sparks will work out of the box?
  2. Do you have a sense of how intrusive the changes are going to be? Is it minor updates, or will it break PRs against current code base?
  3. Do you expect to do the updates in private and then release them when done, or do development in the open?

Thank you!

Hi @itamarst,

  1. I have made changes for the Spark 2.3 and in the process to make changes for Spark 2.4, the changes for Spark2.3 version are being tried out by several users and it's in the review process. I am hopeful that the changes will be merged at max by mid of June. But if you want to try out Spark2.3 then you can checkout by personal branch, it would helpful for merging the changes soon and to provide you the changes right away.

  2. The changes are for sure major ones as the current Dr.Elephant supports Spark 1.x and for migrating to Spark2.x (especially Spark2.3/2.4) a lot of changes are done. But these changes are done in the Fetcher part and the Heuristics part is unaffected, so if your changes are related to SparkFetcher or FSFectcher class then your PRs will break mostly. You can estimate the changes done for this migration by having a look at the personal branch provide above.

  3. I am making changes in the public forked repo only. Also I update the issue #683 with the updates available for the changes made.

So it turns out that there are a bunch of extra metrics one wants for peak memory, and it would be great to have them in Dr. Elephant... once they're in Spark. They are being added in a PR to Spark that will hopefully be merges soon: apache/spark#29020

Thanks @itamarst for the update. I noticed that this PR is merged, will track in which Spark3.x it gets merged.