PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:
- Low Resistance Useability
- Easy Customization
- Scalable and Easier to Deploy
It has been built on the shoulders of giants like PyTorch(obviously), and PyTorch Lightning.
Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from here, picking up the right CUDA version for your machine.
Once, you have got Pytorch installed, just use:
pip install pytorch_tabular[all]
to install the complete library with extra dependencies.
And :
pip install pytorch_tabular
for the bare essentials.
The sources for pytorch_tabular can be downloaded from the Github repo
_.
You can either clone the public repository:
git clone git://github.com/manujosephv/pytorch_tabular
Once you have a copy of the source, you can install it with:
python setup.py install
For complete Documentation with tutorials visit ReadTheDocs
- FeedForward Network with Category Embedding is a simple FF network, but with an Embedding layers for the categorical columns.
- Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.
- TabNet: Attentive Interpretable Tabular Learning is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.
- Mixture Density Networks is a regression model which uses gaussian components to approximate the target function and provide a probabilistic prediction out of the box.
- AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks is a model which tries to learn interactions between the features in an automated way and create a better representation and then use this representation in downstream task
- TabTransformer is an adaptation of the Transformer model for Tabular Data which creates contextual representations for categorical features.
- FT Transformer from Revisiting Deep Learning Models for Tabular Data
To implement new models, see the How to implement new models tutorial. It covers basic as well as advanced architectures.
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig
data_config = DataConfig(
target=['target'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
batch_size=1024,
max_epochs=100,
gpus=1, #index of the GPU to use. 0, means CPU
)
optimizer_config = OptimizerConfig()
model_config = CategoryEmbeddingModelConfig(
task="classification",
layers="1024-512-512", # Number of nodes in each layer
activation="LeakyReLU", # Activation between each layers
learning_rate = 1e-3
)
tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_from_checkpoint("examples/basic")
- PyTorch Tabular – A Framework for Deep Learning for Tabular Data
- Neural Oblivious Decision Ensembles(NODE) – A State-of-the-Art Deep Learning Algorithm for Tabular Data
- Mixture Density Networks: Probabilistic Regression for Uncertainty Estimation
- Add GaussRank as Feature Transformation
- Add ability to use custom activations in CategoryEmbeddingModel
- Add differential dropouts(layer-wise) in CategoryEmbeddingModel
- Add Fourier Encoding for cyclic time variables
- Integrate Optuna Hyperparameter Tuning
- Add Text and Image Modalities for mixed modal problems
- Add Variable Importance
- Integrate SHAP for interpretability
DL Models
- DNF-Net: A Neural Architecture for Tabular Data
- Attention augmented differentiable forest for tabular data
- XBNet : An Extremely Boosted Neural Network
If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:
@misc{joseph2021pytorch,
title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
author={Manu Joseph},
year={2021},
eprint={2104.13638},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Zenodo Software Citation
@article{manujosephv_2021,
title={manujosephv/pytorch_tabular: v0.7.0-alpha},
DOI={10.5281/zenodo.5359010},
abstractNote={<p>Added a few more SOTA models - TabTransformer, FTTransformer
Made improvements in the model save and load capability
Made installation less restrictive by unfreezing some dependencies.</p>},
publisher={Zenodo},
author={manujosephv},
year={2021},
month={May}
}