The flex-trees package consists of a set of tools and utilities to work with Decision Tree (DT) models in Federated Learning (FL). It is designed to be used with the FLEXible framework, as it is an extension of it.
flex-trees comes with some state-of-the-art decision tree models for federated learning. It also provides multiple tabular datasets to test the models.
The methods implemented in the repository are:
Model |
Description |
Citation |
---|---|---|
Federated ID3 | The ID3 model adapted to a federated learning scenario. | A Hybrid Approach to Privacy-Preserving Federated Learning |
Federated Random Forest | The Random Forest (RF) model adapted to a federated learning scenario. Each client builds a RF locally, then N trees are randomly sampled from each client to get a global RF composed from the N trees retrieved from the clients. |
Federated Random Forests can improve local performance of predictive models for various healthcare applications |
Federated Gradient Boosting Decision Trees | The Gradient Boosting Decision Trees model adapted to a federated learning scenario. In this model a global hash table is first created to aling the data between the clients within sharing it. After that, N trees (CART) are built by the clients. The process of building the ensemble is iterative, and one client builds the tree, then it is added to the ensemble, and after that the weights of the instances is updated, so the next client can build the next tree with the weights updated. |
Practical Federated Gradient Boosting Decision Trees |
The tabular datasets available in the repository are:
Dataset |
Description |
Citation |
---|---|---|
Adult | The Adult dataset is a dataset that contains demographic information about the people, and the task is to predict if the income of the person is greater than 50K. | UCI Machine Learning Repository |
Breast Cancer | The Breast Cancer dataset is a dataset that contains information about the breast cancer, and the task is to predict if the cancer is benign or malignant. | UCI Machine Learning Repository |
Credit Card | The Credit Card dataset is a dataset that contains information about the credit card transactions, and the task is to predict if the transaction is fraudulent or not. | Kaggle |
ILPD | The ILPD dataset is a dataset that contains information about the Indian Liver Patient, and the task is to predict if the patient has liver disease or not. | UCI Machine Learning Repository |
Nursery | The Nursery dataset is a dataset that contains information about the nursery, and the task is to predict the acceptability of the nursery. | UCI Machine Learning Repository |
Bank Marketing | The Bank Marketing dataset is a dataset that contains information about the bank marketing, and the task is to predict if the client will subscribe to a term deposit. | UCI Machine Learning Repository |
Magic Gamma | The Magic Gamma dataset is a dataset that contains information about the magic gamma, and the task is to predict if the gamma is signal or background. | UCI Machine Learning Repository |
To get started with flex-trees, you can check the notebooks available in the repository. They cover the following topics:
- Federated ID3 with FLEXible.
- Federated Random Forest with FLEXible.
- Practical Federated Gradient Boosting Decision Trees with FLEXible.
We recommend Anaconda/Miniconda as the package manager. The following is the corresponding flex-trees
versions and supported Python versions.
flex |
flex-trees |
Python |
---|---|---|
main / nightly |
main / nightly |
>=3.8 , <=3.11 |
v0.6.0 |
v0.1.0 |
>=3.8 , <=3.11 |
To install the package, you can use the following commands:
Using pip:
pip install flextrees
Download the repository and install it locally:
git clone git@github.com:FLEXible-FL/flex-trees.git
cd flex-trees
pip install -e .
## Citation
If you use this package, please cite the following paper:
TODO: Add citation