/cocoa

communication-efficient distributed coordinate ascent

Primary LanguageScalaApache License 2.0Apache-2.0

CoCoA - A Framework for Communication-Efficient Distributed Optimization

New! ProxCoCoA+ provides support for L1-regularized objectives. See paper and code.

We've added support for faster additive udpates with CoCoA+. See more information here.

This code performs a comparison of 5 distributed algorithms for training of machine learning models, using Apache Spark. The implemented algorithms are

  • CoCoA+
  • CoCoA
  • mini-batch stochastic dual coordinate ascent (mini-batch SDCA)
  • stochastic subgradient descent with local updates (local SGD)
  • mini-batch stochastic subgradient descent (mini-batch SGD)

The present code trains a standard SVM (hinge-loss, l2-regularized) using SDCA as a local solver, and reports training and test error, as well as the duality gap certificate if the method is primal-dual. The code can be easily adapted to include other internal solvers or solve other objectives.

Getting Started

How to run the code locally:

sbt/sbt assembly
./run-demo-local.sh

(For the sbt script to run, make sure you have downloaded CoCoA into a directory whose path contains no spaces.)

References

The CoCoA+ and CoCoA algorithmic frameworks are described in more detail in the following papers:

Ma, C., Smith, V., Jaggi, M., Jordan, M. I., Richtarik, P., & Takac, M. Adding vs. Averaging in Distributed Primal-Dual Optimization. ICML 2015 - International Conference on Machine Learning.

Jaggi, M., Smith, V., Takac, M., Terhorst, J., Krishnan, S., Hofmann, T., & Jordan, M. I. Communication-Efficient Distributed Dual Coordinate Ascent (pp. 3068–3076). NIPS 2014 - Advances in Neural Information Processing Systems 27.

Smith, V., Forte, S., Jordan, M. I., Jaggi, M. L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.