This project is about inferring ML feature types from the raw CSV files.
Benchmark-Labeled-Data/ contains our labeled dataset with the train/test partitions, corresponding metadata, raw CSV files, and our base featurization for ML models.
Models/ contain the source code (jupyter notebooks) of different ML models and the apis we use to benchmark feature type inference for AutoML platforms.
Pre-trained Models/ contain the trained ML models ready for inference.
Library/ contain our models and featurization routines wrapped under functions in a Python library. It explains how to load the pre-trained models for inference.
Downstream-Benchmark/ contain links to the datasets, their source details, and downstream model source code
The following table presents the binarized class-specific accuracy, precision, and recall of different approaches on our benchmark labeled held-out test dataset.
Feature Type | Metric | TFDV | Pandas | TransmogrifAI | AutoGluon | Log Reg | CNN | Rand Forest |
---|---|---|---|---|---|---|---|---|
Numeric | Precision | 0.657 | 0.614 | 0.605 | 0.646 | 0.909 | 0.929 | 0.934 |
Recall | 1 | 1 | 1 | 1 | 0.943 | 0.941 | 0.984 | |
Accuracy | 0.814 | 0.776 | 0.767 | 0.805 | 0.946 | 0.953 | 0.97 | |
Categorical | Precision | 0.396 | - | - | 0.667 | 0.808 | 0.846 | 0.913 |
Recall | 0.652 | 0.534 | 0.884 | 0.928 | 0.943 | |||
Accuracy | 0.691 | 0.831 | 0.925 | 0.945 | 0.966 | |||
Datetime | Precision | 0.985 | 0.956 | 1 | 1 | 0.951 | 0.925 | 0.945 |
Recall | 0.475 | 0.915 | 0.454 | 0.844 | 0.972 | 0.965 | 0.972 | |
Accuracy | 0.962 | 0.991 | 0.961 | 0.989 | 0.994 | 0.992 | 0.994 | |
Sentence | Precision | 0.472 | - | - | 0.516 | 0.913 | 0.725 | 0.865 |
Recall | 0.457 | 0.902 | 0.793 | 0.804 | 0.902 | |||
Accuracy | 0.951 | 0.956 | 0.987 | 0.977 | 0.989 | |||
Not-Generalizable | Precision | - | - | - | 0.465 | 0.732 | 0.81 | 0.934 |
Recall | 0.53 | 0.732 | 0.66 | 0.86 | ||||
Accuracy | 0.883 | 0.947 | 0.937 | 0.978 | ||||
Context-Specific | Precision | - | 0.08 | 0.074 | - | 0.747 | 0.741 | 0.859 |
Recall | 0.295 | 0.295 | 0.621 | 0.663 | 0.705 | |||
Accuracy | 0.609 | 0.582 | 0.944 | 0.946 | 0.961 | |||
We invite researchers and practitioners to use our labeled datasets and contribute to create better featurizations, models, and/or augment our data. By submitting results, you acknowledge that your holdout test results (data_test.csv) are obtained purely by training on the training set (data_train.csv).
Approaches | 9-class Accuracy | Numeric | Categorical | Datetime | Sentence | URL | Embedded Number | List | Not-Generalizable | Context-Specific | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | ||||||||||||
Random Forest | 0.9259 | 0.934 | 0.984 | 0.913 | 0.943 | 0.945 | 0.972 | 0.865 | 0.902 | 0.968 | 0.938 | 0.929 | 0.929 | 1 | 0.827 | 0.934 | 0.86 | 0.859 | 0.705 | ||||||||||
k-NN | 0.8796 | 0.946 | 0.94 | 0.874 | 0.884 | 0.914 | 0.952 | 0.841 | 0.796 | 1 | 0.909 | 0.842 | 0.885 | 0.87 | 0.769 | 0.838 | 0.801 | 0.681 | 0.722 | ||||||||||
CNN | 0.8788 | 0.929 | 0.941 | 0.846 | 0.928 | 0.925 | 0.965 | 0.725 | 0.804 | 0.828 | 0.75 | 0.747 | 0.717 | 0.732 | 0.577 | 0.81 | 0.693 | 0.741 | 0.663 | ||||||||||
RBF-SVM | 0.8761 | 0.921 | 0.944 | 0.855 | 0.885 | 1 | 0.963 | 0.879 | 0.624 | 0.967 | 0.879 | 0.955 | 0.972 | 0.542 | 0.907 | 0.832 | 0.796 | 0.768 | 0.676 | ||||||||||
Logistic Regression | 0.8643 | 0.909 | 0.943 | 0.808 | 0.884 | 0.951 | 0.972 | 0.913 | 0.793 | 0.939 | 0.969 | 0.919 | 0.919 | 0.93 | 0.769 | 0.732 | 0.66 | 0.747 | 0.621 |