- Graduate School, Leavey School of Business
- Department of Information Systems & Analytics
- Course MSIS 2627: Big Data Modeling & Analytics
- Big-Data-MapReduce Course @ Santa Clara University
- Class meeting dates:
- Start: September 23, 2019
- End: December 7, 2019
- Class hours:
- MSIS 2627-01 (92426) MW 5:45:00 PM 7:20:00 PM
- MSIS 2627-02 (92427) MW 7:35:00 PM 9:10:00 PM
- Instructor: Mahmoud Parsian
- Class room: Lucas Hall 307
- Office: 216AA, 2nd Floor, Lucas Hall
- Office Hours: by appointment
1.
PySpark Algorithms Book by Mahmoud Parsian2.
Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer3.
Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
- 1. A Very Brief Introduction to MapReduce by Diana MacLean
- 2. Introduction to MapReduce by Mahmoud Parsian
- MSIS 2627-01 (92426): Monday, 5:45:00 PM 7:20:00 PM, PST
- MSIS 2627-02 (92427): Monday, 7:35:00 PM 9:10:00 PM, PST
- Mon/Wed 5:45 pm classes
- Monday, December 9, 2019, 5:45 – 7:45 pm, PST
- Mon/Wed 7:35 pm classes
- Wednesday, December 11, 2019, 5:45 – 7:45 pm, PST
The main focus of this class is to cover the following concepts:
- Concepts of Big Data
- Distributed File Systems
- Distributed Computing
- Distributed and Parallel Algorithms
- MapReduce Paradigm
- MapReduce Algorithms
- Scale-out Architectures (using Hadoop, Spark, PySpark)
- Apache Spark
- Use Spark, Py-Spark, and Python to teach MapReduce and distributed computing
- SQL for NoSQL Data, How?
- Amazon Athena
- Amazon Athena, S3, Data Partitioning
PySpark Algorithms
Data Algorithms: Recipes for Scaling up with Hadoop and Spark