/Cook

Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark

Primary LanguageClojureApache License 2.0Apache-2.0

⚠️ Cook Scheduler Development Has Ceased

After seven years of developing Cook Scheduler we have made the decision to archive the project. Cook will remain available on GitHub in archive mode but no further development will occur.

When Cook was open sourced it solved difficult problems in on-premises, capacity-constrained data centers. Today, however, the embrace of the public cloud has changed the problems that need to be solved. This shift is also reflected in slowing community contribution to Cook and the emergence of many other open source projects in this space. Given this, it no longer makes sense for us to maintain Cook as an open source project.

We are thankful for the opportunity to have shared Cook with the community and grateful for your contributions. Two Sigma remains committed to supporting open source software. You can find out more about our other projects and contributions here: https://www.twosigma.com/open-source/.

Cook Scheduler

Welcome to Two Sigma's Cook Scheduler!

What is Cook?

  • Cook is a powerful batch scheduler, specifically designed to provide a great user experience when there are more jobs to run than your cluster has capacity for.
  • Cook is able to intelligently preempt jobs to ensure that no user ever needs to wait long to get quick answers, while simultaneously helping you to achieve 90%+ utilization for massive workloads.
  • Cook has been battle-hardened to automatically recover after dozens of classes of cluster failures.
  • Cook can act as a Spark scheduler, and it comes with a REST API, Java client, Python client, and CLI.

Core concepts is a good place to start to learn more.

Releases

Check the changelog for release info.

Subproject Summary

In this repository, you'll find several subprojects, each of which has its own documentation.

  • scheduler - This is the actual Mesos framework, Cook. It comes with a JSON REST API.
  • jobclient - This includes the Java and Python APIs for Cook, both of which use the REST API under the hood.
  • spark - This contains the patch to Spark to enable Cook as a backend.

Please visit the scheduler subproject first to get started.

Quickstart

Using Google Kubernetes Engine (GKE)

The quickest way to get Cook running locally against GKE is with Vagrant.

  1. Install Vagrant
  2. Install Virtualbox
  3. Clone down this repo
  4. Run GCP_PROJECT_NAME=<gcp_project_name> PGPASSWORD=<random_string> vagrant up --provider=virtualbox to create the dev environment
  5. Run vagrant ssh to ssh into the dev environment

In your Vagrant dev environment

  1. Run gcloud auth login to login to Google cloud
  2. Run bin/make-gke-test-clusters to create GKE clusters
  3. Run bin/start-datomic.sh to start Datomic (Cook database) (Wait until "System started datomic:free://0.0.0.0:4334/, storing data in: data")
  4. Run lein exec -p datomic/data/seed_k8s_pools.clj $COOK_DATOMIC_URI to seed some Cook pools in the database
  5. Run bin/run-local-kubernetes.sh to start the Cook scheduler
  6. Cook should now be listening locally on port 12321

To test a simple job submission:

  1. Run cs submit --pool k8s-alpha --cpu 0.5 --mem 32 --docker-image gcr.io/google-containers/alpine-with-bash:1.0 ls to submit a simple job
  2. Run cs show <job_uuid> to show the status of your job (it should eventually show Success)

To run automated tests:

  1. Run lein test :all-but-benchmark to run unit tests
  2. Run cd ../integration && pytest -m 'not cli' to run integration tests
  3. Run cd ../integration && pytest tests/cook/test_basic.py -k test_basic_submit -n 0 -s to run a particular integration test

Using Mesos

The quickest way to get Mesos and Cook running locally is with docker and minimesos.

  1. Install docker
  2. Clone down this repo
  3. cd scheduler
  4. Run bin/build-docker-image.sh to build the Cook scheduler image
  5. Run ../travis/minimesos up to start Mesos and ZooKeeper using minimesos
  6. Run bin/run-docker.sh to start the Cook scheduler
  7. Cook should now be listening locally on port 12321

Contributing

In order to accept your code contributions, please fill out the appropriate Contributor License Agreement in the cla folder and submit it to tsos@twosigma.com.

Disclaimer

Apache Mesos is a trademark of The Apache Software Foundation. The Apache Software Foundation is not affiliated, endorsed, connected, sponsored or otherwise associated in any way to Two Sigma, Cook, or this website in any manner.

© Two Sigma Open Source, LLC