Cloud and Big Data

Fall 2019

About the Course

Data is growing faster than ever before, more data has been created in the past two years than in the entire previous history. By the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. Our accumulated data will grow from 4.4 zettabytes today to around 44 zettabytes, or 44 trillion gigabytes. The number of devices is also quickly growing. By 2020, we will have over 6.1 billion smartphone users globally and there will be over 50 billion smart connected devices in the world, all developed to collect and share data. The operation of these large volumes of data in order to get their insights in real time presents new challenges and opportunities for existing parallel data processing platforms cloud computing infrastructures.

This course introduces cloud computing and big data, and demonstrates the core tools used to wrangle and analyze big data on the cloud. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry, and manage elastic processing environments using Amazon Web Services. You will also explore the basics of cloud services and cloud deployment models. You will become acquainted with commonly used industry terms, typical business scenarios and applications for the cloud, and benefits and limitations inherent in the new paradigm that is the cloud.

Main course site:

About the Projects

Extreme scale data science at the convergence of big data and massively parallel computing is enabling simulation, modelling and real-time analysis of complex natural and social phenomena at unprecedented scales. The aim of the projects is to gain practical experience into this interplay by applying parallel computation principles in solving a data-intensive problem.

These final projects solve a data-intensive problem with parallel processing on the AWS cloud. They have identified a ata science problem, analysed its compute scaling requirements, collected the data, designed and implemented a parallel software, and demonstrated scaled performance of an end-to-end application.

Fall 2019 Projects

Presented on 12 and 17 December 2019

Group Number  Project Title Team Website
1 Análisis de incidencias de tráfico en la ciudad de Madrid entre los años 2010-2019 Mario de los Santos, Jesús Ramos, Marcos Docampo Prieto-Puga GitHub, Website
2 Análisis de videos de YouTube Daniel Candil Vizcaino, Frederick Ernesto Borges Noronha, Guillermmo Sánchez-Mariscal González GitHub, Website
3 Estudio de mercado de gasolineras en España Diego Laguna, Joel García, Gonzalo Figueroa, Álvaro Antón, John Erik Ibarra GitHub, Website
4 Steam Analysis for Gamers Arturo Barbero Pérez, Jesús Verdúguez Gervaso, Adrián Ogáyar Sánchez, Pedro Martínez Gamero GitHub, Website
5 Music popularities study Simon Markmann, Tomislav Kravarscan, Valerio Moroni, Ena Rajković, Yurii Shcheholiev GitHub, Website
6 Análisis demográfico Andrés Ramiro Ramiro, Jorge López Melchor, Wenhui Lin, Wenbo Sun, Natalia Rodríguez-Peral Valiente GitHub, Website
7 Cloud-Based Machine-Learning Analysis of Previous and Hypothetical Armed Conflicts Diego Isar Muñoz,Álvaro David Ortiz Marchut, Ricardo Rodrigo Ruíz,Jin Wang Xu,Carlos Bilbao Muñoz GitHub, Website
8 Análisis de seismos para la Viking II Miguel Ángel Castillo Moreno, Jorge García Cerros, Garbiel García García GitHub, Website
9 Air pollution Veronika Yankova, Gasan Nazer, Jonas Lührs GitHub, Website
10 Bigdata airline delay & cancellation Jaime Palazón , Iñigo García-Conde , Gerardo Parra, Iván Fermena GitHub, Website