This is the code repository for Machine Learning with Scala Quick Start Guide, published by Packt.
Leverage popular machine learning algorithms and techniques and implement them in Scala
Scala is a highly scalable integration of object-oriented nature and functional programming concepts that make it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to develop and train effective machine learning models in Scala.
This book covers the following exciting features:
- Get acquainted with JVM-based machine learning libraries for Scala such as Spark ML and Deeplearning4j
- Learn RDDs, DataFrame, and Spark SQL for analyzing structured and unstructured data
- Understand supervised and unsupervised learning techniques with best practices and pitfalls
- Learn classification and regression analysis with linear regression, logistic regression, Naïve Bayes, support vector machine, and tree-based ensemble techniques
- Learn effective ways of clustering analysis with dimensionality reduction techniques
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter02.
The code will look like the following:
rawTrafficDF.select("Hour (Coded)", "Immobilized bus", "Broken Truck",
"Vehicle excess", "Fire", "Slowness in traffic (%)").show(5)
Following is what you need for this book: This book is for machine learning developers looking to train machine learning models in Scala without spending too much time and effort. Some fundamental knowledge of Scala programming and some basics of statistics and linear algebra is all you need to get started with this book.
With the following software and hardware list you can run all code files present in the book (Chapter 1-7).
Chapter | Software required | OS required |
---|---|---|
1-3,6 | Spark: 2.3.0 (or higher), Hadoop: 2.7 (or higher), Java (JDK and JRE): 1.8+, Scala: 2.11.x (or higher), Eclipse Mars/Luna: latest, Maven Eclipse plugin: 2.9 or higher, Maven compiler plugin for Eclipse: 2.3.2 or higher, Maven assembly plugin for Eclipse: 2.4.1 or higher, Importantly, re-use the provided pom.xml file with Packt supplementary and change the version mentioned above and APIs. Then everything will be managed accordingly. | Windows, Mac OS X, and Linux (Any) |
5 | Same as above plus the following: h2o version: 3.22.1.1, sparkling water version: 2.4.1, adam version: 0.23.0 | Windows, Mac OS X, and Linux (Any) |
7 | Same as above PLUS the following: Spark csv_2.11 version: 1.3.0, ND4j backend version: - If GPU configured: nd4j-cuda-9.0-platform - Otherwise: nd4j-native, ND4j version: 1.0.0-alpha, DL4j version: 1.0.0-alpha, Datavec version: 1.0.0-alpha, Arbiter version: 1.0.0-alpha, Logback version: 1.2.3. | Windows, Mac OS X, and Linux (Any) |
Click on the following link to see the Code in Action:
Md. Rezaul Karim Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, plus 10 years of R&D experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a Ph.D. candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Click here if you have any feedback or suggestions.
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.