/ds5110-spring23

Primary LanguageJupyter NotebookMIT LicenseMIT

DS5110: Big Data Systems

Welcome to the graduate course on Big Data Systems. Scalable big data systems are a central part of modern data science. This course will cover topics including design and use of parallel dataflow systems (MapReduce/Hadoop and Spark), scalable and parallel Python analytics frameworks, cloud data systems (cloud storage, cloud-native data processing), and machine learning systems. A major component of this course is hands-on programming using scalable analytics tools and cloud resources such as Google Cloud and Azure Cloud.