Tabular Waste Data Classifications Model POCs

  1. Fast AI multi-label classification POC - EN
  2. SpaCy & Scikit-learn multi-label classifications POC - EN, NL
  3. SPaCy Entities Anlysis POC - EN, NL

See the ML model here

Uses Dutch Waste Data

  • Construction and demolition waste
  • Packaging waste and recyclables
  • Electronic and electrical equipment
  • Vehicle and oily wastes
  • Healthcare and related wastes
data ommitted within repo for data sensitivity reasons

Training of the Fast AI Machine Learning classification model:

This project uses Fast AI Tabular Neural Nets for ML classification model:

  • Using neural nets for analyzing tabular data
  • Loading data into Pandas DataFrame
  • Using categorical variables for entity embedings(more on embedings)
  • using continuous variables (numeric values) for neural nets
  • using 3 data sets: train, validation and test data
*unfortunately for data privacy reasons the data required is not included in this repo. Please reach out or message if you will

Treating The Data:

1.Translation services

  • Google tranlsate API and service account
  • client was set up to provide the translations from nl to en

2. Augmenting data

  • Treating Boolean-like field value overwrrides - fields of 2 options of strings become integers - 0 and 1
  • Fields such as pureOrMixed string values of pure and mixed become integers 1 or 0 to be set later as continuous variables in tabular learner
  • Prefilling the fields where possible - such as waste description field, prefilled with euralCodeDescription when underdefined

3. Creating 3 sets of data: train, validate and test data

  • loaded to pandas DataFrame
  • for training of ML model - uses train and validation data with rich fields
  • for testing of ML model - uses test data with missing fields