/multivac-kaggle-titanic

Simple example of Titanic competition by Spark 2.2

Primary LanguageScalaMIT LicenseMIT

Machine Learning from Disaster (Kaggle)

GitHub license Build Status Multivac Discuss Multivac Channel

This repo is just for learning purposes to anyone who is new to Machine Learning by Apache Spark. https://www.kaggle.com/c/titanic

Environment and Tests

  • Scala 2.11.x
  • Apache Spark 2.2
  • Tests locally and in Cloudera (CDH 5.12)

How-To

  • sbt update
  • sbt "run local" - This runs the code on your local machine
  • sbt pacakge - to use the JAR by spark-submit
  • You can set ParamGrid values for cross validation inside ParamGridParameters.scala

Re-used Codes

Code of Conduct

This, and all github.com/multivacplatform projects, are under the Multivac Platform Open Source Code of Conduct. Additionally, see the Typelevel Code of Conduct for specific examples of harassing behavior that are not tolerated.

Useful Links

Copyright and License

Code and documentation copyright (c) 2017-2019 ISCPIF - CNRS. Code released under the MIT license.