┼╂┼
∩_┃_∩
|ノ ヽ
/ ● ● |
| (_●_) ミ < There is absolutely no warranty. >
彡、 |∪| 、`\
/ __ ヽノ /´> )
(___) / (_/
Using this library, you can:
- Simplify the structuring of table data and feature engineering
- implify the training and hyperparameter search for ML tools with Sklearn API (including sklearn, lightgbm, catboost, etc.)
- Simplify the training of Pytorch models (including the use of amp and parallelization across multiple GPUs)
- Customize training with Hook/Callback interface (such as Earlystop, logging functions integrated with wandb, etc.)
- Automated exploratory data analysis
- Convenient functions for basic biostatistical analysis.
- Wandb integration
- Upgrade to newer backend libraries
- Integration of TensorboardLogger into TorchLogger
- Automated hyperparameter tuning for lightgbm/xgboost/catboost.cv()
- Multi-node DDP
pip install git+https://github.com/analokmaus/kuma_utils.git@v0.6.2 # Stable
pip install git+https://github.com/analokmaus/kuma_utils.git@master # Latest
IMPORTANT For Apple silicon users, there can be an error related to lightgbm. Please install lightgbm with the following command and then install kuma_utils.
pip install --no-binary lightgbm --config-settings=cmake.define.USE_OPENMP=OFF 'lightgbm==4.3.0'
pip install git+https://github.com/analokmaus/kuma_utils.git
git clone https://github.com/analokmaus/kuma_utils.git
cd kuma_utils
poetry install
or simply,
poetry add git+https://github.com/analokmaus/kuma_utils.git
WIP
- Exploratory data analysis
- Data preprocessing
- Train and validate scikit-learn API models
- Train pytorch models on single GPU
- Train pytorch models on multiple GPU
- Statistical analysis (propensity score matching)
┣ visualization
┃ ┣ explore_data - Simple exploratory data analysis.
┃
┣ preprocessing
┃ ┣ SelectNumerical
┃ ┣ SelectCategorical
┃ ┣ DummyVariable
┃ ┣ DistTransformer - Distribution transformer for numerical features.
┃ ┣ LGBMImputer - Regression imputer for missing values using LightGBM.
┃
┣ stats
┃ ┣ make_demographic_table - Automated demographic table generator.
┃ ┣ PropensityScoreMatching - Fast and capable of using all sklearn API models as a backend.
┃
┣ training
┃ ┣ Trainer - Wrapper for scikit-learn API models.
┃ ┣ CrossValidator - Ccross validation wrapper.
┃ ┣ LGBMLogger - Logger callback for LightGBM/XGBoost/Optuna.
┃ ┣ StratifiedGroupKFold - Stratified group k-fold split.
┃ ┣ optuna - optuna modifications.
┃
┣ metrics - Universal metrics
┃ ┣ SensitivityAtFixedSpecificity
┃ ┣ RMSE
┃ ┣ Pearson correlation coefficient
┃ ┣ R2 score
┃ ┣ AUC
┃ ┣ Accuracy
┃ ┣ QuandricWeightKappa
┃
┣ torch
┣ lr_scheduler
┃ ┣ ManualScheduler
┃ ┣ CyclicCosAnnealingLR
┃ ┣ CyclicLinearLR
┃
┣ optimizer
┃ ┣ SAM
┃
┣ modules
┃ ┣ Mish
┃ ┣ AdaptiveConcatPool2d/3d
┃ ┣ GeM
┃ ┣ CBAM2d
┃ ┣ GroupNorm1d/2d/3d
┃ ┣ convert_groupnorm - Convert all BatchNorm to GroupNorm.
┃ ┣ TemperatureScaler - Probability calibration for pytorch models.
┃ ┣ etc...
┃
┣ TorchTrainer - PyTorch Trainer.
┣ EarlyStopping - Early stopping callback for TorchTrainer. Save snapshot when best score is achieved.
┣ SaveEveryEpoch - Save snapshot at the end of every epoch.
┣ SaveSnapshot - Snapshot callback.
┣ SaveAverageSnapshot - Moving average snapshot callback.
┣ TorchLogger - Logger
┣ SimpleHook - Simple train hook for almost any tasks (see tutorial).
The source code in this repository is released under the MIT license.