Data Algorithms with Spark by Mahmoud Parsian
"... This book will be a great resource for both readers looking to implement existing algorithms in a scalable fashion and readers who are developing new, custom algorithms using Spark. ..." Dr. Matei Zaharia Original Creator of Apache Spark FOREWORD by Dr. Matei Zaharia |
Data Algorithms with Spark by Mahmoud Parsian
Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)
Mahmoud Parsian
Author:Goal of this book: enable writing efficient & simpler PySpark code for data algorithms using Spark
-
This new O'Reilly book is the successor Edition of Data Algorithms (published by O'Reilly)
-
This book uses PySpark (much simpler and readable)
-
@OReillyMedia: Data Algorithms with Spark, By @mahmoudparsian
-
Autor Contact: [ Email ] [ Mahmoud Parsian @LinkedIn ][ Mahmoud Parsian @GitHub ]
Github Chapter Solutions
-
This GitHub repository will host all source code and scripts for Data Algorithms with Spark
-
Chapter solutions are provided in PySpark and Scala
- PySpark solutions are provided by Mahmoud Parsian
- Scala solutions are provided by Deepak Kumar and Biman Mandal
Software:
Spark | Python | Scala | Java |
---|---|---|---|
Apache Spark 3.2.0 | Python 3.7.2 | Scala 2.13 | Java 8 |
Table of Contents
Chapter | Title |
---|---|
Bonus Chapters |
|
Chapter 1 | Introduction to Data Algorithms |
Chapter 2 | Transformations in Action |
Chapter 3 | Mapper Transformations |
Chapter 4 | Reductions in Spark |
Chapter 5 | Partitioning Data |
Chapter 6 | Graph Algorithms |
Chapter 7 | Interacting with External Data Sources |
Chapter 8 | Ranking Algorithms |
Chapter 9 | Fundamental Data Design Patterns |
Chapter 10 | Common Data Design Patterns |
Chapter 11 | Join Design Patterns |
Chapter 12 | Feature Engineering in PySpark |