/NAB_Datathon

NAB Datathon Team GPT

Primary LanguageJupyter Notebook

NAB Datathon - Team GPT (3rd Place)

Introduction

This repository contains our submission for the NAB Datathon, where our team, "Team GPT," secured 3rd place.

Team member

Matthew Lam: Lamlonghei888@gmail.com

Eric Kim

Robbie

Syukron

Project Overview

Deriving insights about national waste management database. Clean and preprocess data and make sure its ready for analysis Identify potential correlations or dependencies in the data Visualize using appropriate data visualization techniques to identify patterns and trends

Predicting future waste generation Utilize predictive modelling to forecast future waste generation using ‘Tonnes’ column as a target variable Evaluate your model using appropriate evaluation metric

Data Overview

Year : Financial year. Data is presented for each year between 2006-07 and 2020-21 except 2007-08, 2011-12 and 2012-13, for which years a national data set was not developed. Jurisdiction : State or territory in which the waste was generated. Category: A broad classification of waste material. Type: A more detailed classification of waste material. For example, the category 'Metals' may be split into : 'Aluminium', 'Non-ferrous metals (ex. aluminium)', and 'Iron and steel'. Classification : A reference field stating whether the particular row represents a 'type', 'category' or 'total' (totals collate categories in different ways). Total type: Describes which category each data point classified as 'Total' refers to. Stream : Describes the source of waste, comprising three options: municipal solid waste (MSW) from households and council operations; commercial and industrial (C&I) waste; and construction and demolition (C&D) waste (plus a total collating all three). Management: Refers to the infrastructure that receives waste (e.g. landfill, compost facility, alternative waste treatment facility). Fate: The ultimate destination of the waste, comprising five options: disposal; recycling; energy recovery; long-term storage; and waste reuse. Tonnes: The quantity of waste. Core or non-core : Lists whether the waste is 'core waste' or not.

Data Wrangling

Preprocess_data.ipynb processes the Database file so we can do better data analysis. The data contains missing file from 2005, 2012 and 2014. It also only contains mining data from 2018-2021.

Data Analysis

Shown in NAB Presentation.pptx

Enjoy exploring our NAB Datathon project!