- Fast AI multi-label classification POC - EN
- SpaCy & Scikit-learn multi-label classifications POC - EN, NL
- SPaCy Entities Anlysis POC - EN, NL
See the ML model here
- Construction and demolition waste
- Packaging waste and recyclables
- Electronic and electrical equipment
- Vehicle and oily wastes
- Healthcare and related wastes
This project uses Fast AI Tabular Neural Nets for ML classification model:
- Using neural nets for analyzing tabular data
- Loading data into Pandas DataFrame
- Using categorical variables for entity embedings(more on embedings)
- using continuous variables (numeric values) for neural nets
- using 3 data sets: train, validation and test data
*unfortunately for data privacy reasons the data required is not included in this repo. Please reach out or message if you will
1.Translation services
- Google tranlsate API and service account
- client was set up to provide the translations from nl to en
2. Augmenting data
- Treating Boolean-like field value overwrrides - fields of 2 options of strings become integers -
0
and1
- Fields such as
pureOrMixed
string values ofpure
andmixed
become integers 1 or 0 to be set later as continuous variables in tabular learner - Prefilling the fields where possible - such as waste
description
field, prefilled witheuralCodeDescription
when underdefined
3. Creating 3 sets of data: train, validate and test data
- loaded to pandas DataFrame
- for training of ML model - uses train and validation data with rich fields
- for testing of ML model - uses test data with missing fields