/Scalable_Machine_Learning_on_Big_Data_using_Apache_Spark

Scalable Machine Learning on Big Data using Apache Spark

Primary LanguageJupyter Notebook

Scalable Machine Learning on Big Data using Apache Spark

By IBM on Coursera

About this Course

This course will empower you with the skills to scale data science and machine learning (ML) tasks on Big Data sets using Apache Spark. Most real world machine learning work involves very large data sets that go beyond the CPU, memory and storage limitations of a single computer.

Apache Spark is an open source framework that leverages cluster computing and distributed storage to process extremely large data sets in an efficient and cost effective manner. Therefore an applied knowledge of working with Apache Spark is a great asset and potential differentiator for a Machine Learning engineer.

After completing this course, you will be able to:

  • Gain a practical understanding of Apache Spark, and apply it to solve machine learning problems involving both small and big data.
  • Understand how parallel code is written, capable of running on thousands of CPUs.
  • Make use of large scale compute clusters to apply machine learning algorithms on Petabytes of data using Apache SparkML Pipelines.
  • Eliminate out-of-memory errors generated by traditional machine learning frameworks when data doesn’t fit in a computer's main memory.
  • Test thousands of different ML models in parallel to find the best performing one – a technique used by many successful Kagglers.
  • (Optional) Run SQL statements on very large data sets using Apache SparkSQL and the Apache Spark DataFrame API.

Enrol now to learn the machine learning techniques for working with Big Data that have been successfully applied by companies like Alibaba, Apple, Amazon, Baidu, eBay, IBM, NASA, Samsung, SAP, TripAdvisor, Yahoo!, Zalando and many others.

NOTE: You will practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM at no charge during the course which you can continue to use afterwards.

Prerequisites:

  • Basic python programming
  • Basic machine learning (optional introduction videos are provided in this course as well)
  • Basic SQL skills for optional content

Week 1 Ejercicio 1: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/123b9fcd-a87d-4952-aff8-a58a8cffb991/view?access_token=991d359bb183bd1ae4a49123413ab657332c40966ee5d161a2714d8245f5b34f

Week 1 Ejercicio 2: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/a6272391-5ec5-4a4d-b311-a1748acad319/view?access_token=5ab2a8cf9519807a98576e7381325600fba2728def5d275bc366bb9060c6e26a

Week 1 Ejercicio 3: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/3bacb873-cd60-4844-8c53-a6da7da8e857/view?access_token=106b291a7ba041bab2afdd97b59812d4a50c0588b36cc17b30880b54b6dcd6cb


Week 2 Ejercicio 1: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/8976a59f-4a7f-4e44-b366-74cd0924403c/view?access_token=9889005537450968a894d9856e5c672d2f74dc4196b970037f8f9c1dbf633e9e

Week 2 Ejercicio 2: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/76a0b352-478a-4596-afb6-3f2c8eaa7056/view?access_token=3bb4ac9267f24fe48fedea6ed536677fd1e03814703d27fac448d2afe32501e7

Week 2 Ejercicio 3: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/16db54aa-f721-486e-9cf6-03c8f80a95d6/view?access_token=a2e6db6b7efe089be4c32a5200d5f5ae46867cc728dae0226314fc72a6e736ba


Week 3 Ejercicio 1: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/80454ce9-33e4-467b-99d9-7d0c7aaa71d5/view?access_token=98c877dc287a5e04a84915e52966ae99253c9994bc58420a337894e1f02a4845

Week 3 Ejercicio 2: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/5017db32-855e-4ba0-a46d-f68779434e75/view?access_token=016c19a8eeafb2d8799a492aef6d811e65649ad447fc11dfc96f2fd658abc8f1


Week 4 Ejercicio 1: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/d94c5ac1-fb9f-43bf-afaf-3bd5b3337c8f/view?access_token=9bbdddced7733f5dbdcd4488dd6e5251d09797ceda92a97d643afacd92051a67

Week 4 Assignment: https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/cc7063e6-06ce-438d-aaa4-1e97aa502238/view?access_token=44c4dcd48342a924b38498c3a4a8cb29d103993d7a5d200504bd2e0b7414e5ff