/big_data_final_project_2023

This final project is for Big Data Analytics"Advanced Analytics in Business [D0S07a]" and "Big Data Platforms & Technologies [D0S06a] in KU Leuven

Primary LanguageJupyter Notebook

Final Project for Big Data Analytics"Advanced Analytics in Business [D0S07a]" and "Big Data Platforms & Technologies [D0S06a]"

group 3 by Jierong Wen - r0912240 Shivam Verma - r0959919 Riya Goyal - r0959390 Marco Chi Chung Fong - r0865521 Ye Liu - r0918311

Assigment 1: Predictive Modeling on Tabular Data

In this assignment, the predictive model is built to predict the price of an Airbnb by training the model on the data set of around 9k Airbnb apartments in Belgium. Our main goal is to construct a prediction model with the least RMSE(Root mean square error) in predicting the Airbnb price. We will create the model with three algorithms: Random Forests, XGBoost and a Neural Network. Then select the model with the least RMSE score. The coding part of the assignment is coded on Jupyter using Python.

Assignment 2: Deep Learning on Images

In the second assignment, deep learning on image classification via a pre-trained Convolutional Neural Network is explored on a dataset containing nearly 117K images from restaurants from the Michelin Guide. Our main goal is to construct a CNN prediction model to classify whether one image shows food or the restaurant’s interior. The project is developed via Pytorch which is a well-known deep learning framework in Python due to its superior flexibility and accessibility, and all the codes of this project are public on Github.

Assignment 3: Predicting on Streamed Textual Data

This part of the assignment aims at constructing a predictive model using Spark (Structured) Streaming and textual data. The dataset is based on Steam which the model will predict the score (upvote or downvote) based on the game reviews of the newly released game by the user.

Assignment 4: Graph Analytics

In the fourth assignment, we conducted network analysis on a graphical dataset scrapped from Twitch, a popular live streaming service platform majorly focusing on video game live streaming. The main objective of this assignment is to investigate the underlying community structure-property of streamers on Twitch via community detection algorithm. More specifi- cally, we aim to perform a community mining analysis on both regular streamers and blocked streamers based on games and tags associated with them, to analyze whether blocked streamers present any noticeable patterns in terms of games and tags they used. Furthermore, we could also identify some potential factors that contribute streamers to be blocked by Twitch from the above graphic analytics process.