This repository contains our submission for the NAB Datathon, where our team, "Team GPT," secured 3rd place.
Team member
Matthew Lam: Lamlonghei888@gmail.com
Eric Kim
Robbie
Syukron
Deriving insights about national waste management database. Clean and preprocess data and make sure its ready for analysis Identify potential correlations or dependencies in the data Visualize using appropriate data visualization techniques to identify patterns and trends
Predicting future waste generation Utilize predictive modelling to forecast future waste generation using ‘Tonnes’ column as a target variable Evaluate your model using appropriate evaluation metric
Year : Financial year. Data is presented for each year between 2006-07 and 2020-21 except 2007-08, 2011-12 and 2012-13, for which years a national data set was not developed. Jurisdiction : State or territory in which the waste was generated. Category: A broad classification of waste material. Type: A more detailed classification of waste material. For example, the category 'Metals' may be split into : 'Aluminium', 'Non-ferrous metals (ex. aluminium)', and 'Iron and steel'. Classification : A reference field stating whether the particular row represents a 'type', 'category' or 'total' (totals collate categories in different ways). Total type: Describes which category each data point classified as 'Total' refers to. Stream : Describes the source of waste, comprising three options: municipal solid waste (MSW) from households and council operations; commercial and industrial (C&I) waste; and construction and demolition (C&D) waste (plus a total collating all three). Management: Refers to the infrastructure that receives waste (e.g. landfill, compost facility, alternative waste treatment facility). Fate: The ultimate destination of the waste, comprising five options: disposal; recycling; energy recovery; long-term storage; and waste reuse. Tonnes: The quantity of waste. Core or non-core : Lists whether the waste is 'core waste' or not.
Preprocess_data.ipynb processes the Database file so we can do better data analysis. The data contains missing file from 2005, 2012 and 2014. It also only contains mining data from 2018-2021.
Shown in NAB Presentation.pptx
Enjoy exploring our NAB Datathon project!