PyTorch implementation of Bezier simplex fitting.
The Bezier simplex is a high-dimensional generalization of the Bezier curve. It enables us to model a complex-shaped point cloud as a parametric hyper-surface in high-dimensional spaces. This package provides an algorithm to fit a Bezier simplex to given data points. To process terabyte-scale data, this package supports distributed training, realtime progress reporting, and checkpointing on top of PyTorch Lightning and MLflow.
See the following papers for technical details.
- Kobayashi, K., Hamada, N., Sannai, A., Tanaka, A., Bannai, K., & Sugiyama, M. (2019). Bézier Simplex Fitting: Describing Pareto Fronts of´ Simplicial Problems with Small Samples in Multi-Objective Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 2304-2313. https://doi.org/10.1609/aaai.v33i01.33012304
- Tanaka, A., Sannai, A., Kobayashi, K., & Hamada, N. (2020). Asymptotic Risk of Bézier Simplex Fitting. Proceedings of the AAAI Conference on Artificial Intelligence, 34(03), 2416-2424. https://doi.org/10.1609/aaai.v34i03.5622
Python >=3.8, <3.11.
Download the latest Miniconda and install it. Then, install MLflow on your conda environment:
conda install -c conda-forge mlflow
Prepare data:
cat <<EOS > data.tsv
1 0
0.75 0.25
0.5 0.5
0.25 0.75
0 1
EOS
cat <<EOS > label.tsv
0 1
3 2
4 5
7 6
8 9
EOS
Run the following command:
mlflow run https://github.com/rafcc/pytorch-bsf \
-P data=data.tsv \
-P label=label.tsv \
-P degree=3
which automatically sets up the environment and runs an experiment:
- Download the latest pytorch-bsf into a temporary directory.
- Create a new conda environment and install dependencies in it.
- Run an experiment on the temporary directory and environment.
Parameter | Type | Default | Description |
---|---|---|---|
data | path | required | The data file. The file should contain a numerical matrix in the TSV format: each row represents a record that consists of features separated by Tabs or spaces. |
label | path | required | The label file. The file should contain a numerical matrix in the TSV format: each row represents a record that consists of outcomes separated by Tabs or spaces. |
degree | int |
required | The degree of the Bezier simplex. |
header | int |
0 |
The number of header lines in data/label files. |
delimiter | str | " " |
The delimiter of values in data/label files. |
normalize |
"max" , "std" , "quantile"
|
None |
The data normalization: "max" scales each feature as the minimum is 0 and the maximum is 1, suitable for uniformly distributed data; "std" does as the mean is 0 and the standard deviation is 1, suitable for nonuniformly distributed data; "quantile" does as 5%-quantile is 0 and 95%-quantile is 1, suitable for data containing outliers; None does not perform any scaling, suitable for pre-normalized data. |
split_ratio | float |
0.5 |
The ratio of training data against validation data. |
batch_size | int |
0 |
The size of minibatch. The default uses all records in a single batch. |
max_epochs | int |
1000 |
The number of epochs to stop training. |
accelerator |
"auto" , "cpu" , "gpu" , etc. |
"auto" |
Accelerator to use. See PyTorch Lightning documentation. |
devices | int |
None |
The number of accelerators to use. By default, use all available devices. See PyTorch Lightning documentation. |
num_nodes | int |
1 |
The number of compute nodes to use. See PyTorch Lightning documentation. |
strategy |
"dp" , "ddp" , "ddp_spawn" , etc. |
None |
Distributed strategy. See PyTorch Lightning documentation. |
loglevel | int |
2 |
What objects to be logged. 0 : nothing; 1 : metrics; 2 : metrics and models. |
pip install pytorch-bsf
This package provides a command line interface to train a Bezier simplex with a dataset file.
Execute the torch_bsf
module:
python -m torch_bsf \
--data=data.tsv \
--label=label.tsv \
--degree=3
Train a model by fit()
, and call the model to predict.
import torch
import torch_bsf
# Prepare training data
ts = torch.tensor( # parameters on a simplex
[
[3/3, 0/3, 0/3],
[2/3, 1/3, 0/3],
[2/3, 0/3, 1/3],
[1/3, 2/3, 0/3],
[1/3, 1/3, 1/3],
[1/3, 0/3, 2/3],
[0/3, 3/3, 0/3],
[0/3, 2/3, 1/3],
[0/3, 1/3, 2/3],
[0/3, 0/3, 3/3],
]
)
xs = 1 - ts * ts # values corresponding to the parameters
# Train a model
bs = torch_bsf.fit(params=ts, values=xs, degree=3)
# Predict by the trained model
t = [[0.2, 0.3, 0.5]]
x = bs(t)
print(f"{t} -> {x}")
See documents for more details. https://rafcc.github.io/pytorch-bsf/
RIKEN AIP-FUJITSU Collaboration Center (RAFCC)
MIT