/ML-Data-Prep-Zoo

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

ML Data Prep Zoo

A zoo of labelled datasets and ML models for data prep tasks. Please refer to our paper for more details.

Task 1 (t1): ML Feature Type Inference (Multi-class classification)

Task 2 (t2): Category Deduplication (Binary classification)

Task 3 (t3): Embedded Number Extraction (Sequence-to-sequence learning)

Task 4 (t4): Detect Anomalous Categories (Binary classification)

Task 5 (t5): Multiple Number Units Detection (Binary classification)

Task 6 (t6): List Domain Extraction (Sequence-to-set-of-sequence learning)