Task: ML Feature Type Inference

This project is about inferring ML feature types from the raw CSV files.

Benchmark Labeled Data

Benchmark-Labeled-Data/ contains our labeled dataset with the train/test partitions, corresponding metadata, raw CSV files, and our base featurization for ML models.

Source code

Models/ contain the source code (jupyter notebooks) of different ML models and the apis we use to benchmark feature type inference for AutoML platforms.

Pre-trained Models

Pre-trained Models/ contain the trained ML models ready for inference.

Library

Library/ contain our models and featurization routines wrapped under functions in a Python library. It explains how to load the pre-trained models for inference.

Downstream Benchmark

Downstream-Benchmark/ contain links to the datasets, their source details, and downstream model source code

AutoML Benchmark

The following table presents the binarized class-specific accuracy, precision, and recall of different approaches on our benchmark labeled held-out test dataset.

Feature Type Metric TFDV Pandas TransmogrifAI AutoGluon Log Reg CNN Rand Forest
Numeric Precision 0.657 0.614 0.605 0.646 0.909 0.929 0.934
Recall 1 1 1 1 0.943 0.941 0.984
Accuracy 0.814 0.776 0.767 0.805 0.946 0.953 0.97
Categorical Precision 0.396 - - 0.667 0.808 0.846 0.913
Recall 0.652 0.534 0.884 0.928 0.943
Accuracy 0.691 0.831 0.925 0.945 0.966
Datetime Precision 0.985 0.956 1 1 0.951 0.925 0.945
Recall 0.475 0.915 0.454 0.844 0.972 0.965 0.972
Accuracy 0.962 0.991 0.961 0.989 0.994 0.992 0.994
Sentence Precision 0.472 - - 0.516 0.913 0.725 0.865
Recall 0.457 0.902 0.793 0.804 0.902
Accuracy 0.951 0.956 0.987 0.977 0.989
Not-Generalizable Precision - - - 0.465 0.732 0.81 0.934
Recall 0.53 0.732 0.66 0.86
Accuracy 0.883 0.947 0.937 0.978
Context-Specific Precision - 0.08 0.074 - 0.747 0.741 0.859
Recall 0.295 0.295 0.621 0.663 0.705
Accuracy 0.609 0.582 0.944 0.946 0.961

Leaderboard on our Labeled Data

We invite researchers and practitioners to use our labeled datasets and contribute to create better featurizations, models, and/or augment our data. By submitting results, you acknowledge that your holdout test results (data_test.csv) are obtained purely by training on the training set (data_train.csv).

Approaches 9-class Accuracy Numeric Categorical Datetime Sentence URL Embedded Number List Not-Generalizable Context-Specific
Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall
Random Forest 0.9259 0.934 0.984 0.913 0.943 0.945 0.972 0.865 0.902 0.968 0.938 0.929 0.929 1 0.827 0.934 0.86 0.859 0.705
k-NN 0.8796 0.946 0.94 0.874 0.884 0.914 0.952 0.841 0.796 1 0.909 0.842 0.885 0.87 0.769 0.838 0.801 0.681 0.722
CNN 0.8788 0.929 0.941 0.846 0.928 0.925 0.965 0.725 0.804 0.828 0.75 0.747 0.717 0.732 0.577 0.81 0.693 0.741 0.663
RBF-SVM 0.8761 0.921 0.944 0.855 0.885 1 0.963 0.879 0.624 0.967 0.879 0.955 0.972 0.542 0.907 0.832 0.796 0.768 0.676
Logistic Regression 0.8643 0.909 0.943 0.808 0.884 0.951 0.972 0.913 0.793 0.939 0.969 0.919 0.919 0.93 0.769 0.732 0.66 0.747 0.621