spotify/spydra

Autoscaler should use Dataproc autoscaling

Opened this issue · 3 comments

Cloud Dataproc now natively supports autoscaling. Dataproc's autoscaling seems to be a superset of the functionality in Spydra's autoscaler. If you're interested, I'd be happy to take a stab at moving Spydra to Dataproc's autoscaler and getting rid of the init action.

The one major difference is that the minimum cooldown period (scaling interval) in Dataproc is 10 minutes, while Spydra's README suggests 2 minutes. Are folks at Spotify using scaling intervals that short?

I haven't looked closely at Dataproc's native autoscaler but supporting replacing our simple heuristic makes sense. It has been our overall strategy to fill in gaps and replace our tools with officially supported tools as they become available. I would be happy to see a PR from your side.

The 10 minutes interval should be fine. The only implication is that one needs to initially size the cluster at a reasonable size to not have jobs taking 10 minutes more but I believe that that's a reasonable requirement.

Yup -- you can still set the initial cluster size using --num-workers and --num-preemptible-workers.

Hey as an update, autoscaling just launched to Beta today! A few updates since alpha:

  1. The minimum cooldown period is now 2 minutes
  2. Monitoring autoscaling and cluster metrics is far easier now. We have common YARN and HDFS metrics in the cluster page (of the cloud console) and autoscaler logs to understand why the autoscaler made certain decisions (click on "View logs" and select just just the autoscaler logs)
  3. You can also enable autoscaling, disable autoscaling, or switch autoscaling policies on clusters at any time. You can also update autoscaling policies live, without needing to touch the cluster.
  4. (Teaser) there's going to be a new shuffle service so you can actually autoscale clusters without killing in-progress jobs.

Now that the API is stable, I'd like to circle back and actually integrate native Dataproc autoscaling into Spydra. I think the easiest option would be to have users create autoscaling policies outside of spydra, and then just specify the autoscaling policy to use in their spydra config. WDYT?