kaggle-competitions-framework
Framework for fast prototyping and training of single models and ensembles. Training happens according to a data generator you pass in, but initially designed to take a K-Fold split in order to produce both OOF and test predictions.
Below you can find a short API explanation, however it does not cover all the possible use cases. For more information, please check out examples.
Table of Contents
Data Loader
Data Loader Initialization
from data import DataLoader
dl_params = {
'target': "target",
'id': "ID_code"
}
data_loader = DataLoader(data_folder, **dl_params)
data_folder
is a path to a folder where train.csv
and test.csv
files can be found
dl_parameters
specifies a target and ID columns in your train.csv
file. Target will be removed from the data before the training starts.
Data Preprocessors
from data.preprocessors import GenericDataPreprocessor
from sklearn.preprocessing import StandardScaler
class DropColumns(GenericDataPreprocessor):
def __init__(self):
pass
def fit_transform(self, X, y=None):
return self.transform(X)
def transform(self, X):
return X.drop(['ID_code'], axis=1)
data_loader.preprocess(DropColumns, StandardScaler, ToNumpy)
It will also work with custom arguments.
from sklearn.preprocessing import MinMaxScaler
data_loader.preprocess(MinMaxScaler, feature_range=(-1, 1))
All preprocessors must have fit_transform
and transform
functions implemented, thus sklearn transformers can be applied to the DataLoader
. Custom preprocessors must inherit GenericDataPreprocessor
base class.
Data Generator
data_loader.generate_split(StratifiedKFold,
n_splits=5, shuffle=True, random_state=42)
Model Loader
Model Loader Initialization
Models are initialized with three main arguments, the model class, loader parameters and the model parameters.
model_params = {
'name': "dense_nn",
'fit': "fit",
'predict': "predict_proba",
'pred_col': 1
}
nn_params = {
'build_fn': dense_nn_model,
'epochs': 25,
'batch_size': 256,
'verbose': 1
}
model = ModelLoader(KerasClassifier, loader_params, **nn_params)
Initialize a custom model
If the model does not have a sklearn-like interface, it is still possible to create a custom model interface, inherited from GenericModel
base class. fit
and predict
function must be implemented as following
class LightGbmTrainer(GenericModel):
def __init__(self):
self.lgb_params = {
"objective" : "binary",
"metric" : "auc",
"boosting": 'gbdt',
"max_depth" : 4,
"learning_rate" : 0.01,
"bagging_fraction" : 0.8,
"tree_learner": "serial",
"verbosity" : 0,
}
def fit(self, train, cv):
x_tr, y_tr = train
x_cv, y_cv = cv
trn_data = lgb.Dataset(x_tr, label=y_tr)
val_data = lgb.Dataset(x_cv, label=y_cv)
evals_result = {}
self.model = lgb.train(self.lgb_params,
trn_data,
100000,
valid_sets = [trn_data, val_data],
early_stopping_rounds=3000,
verbose_eval = 1000,
evals_result=evals_result)
def predict(self, test):
return self.model.predict(test)
model_params = {
'name': "lightgbm",
'fit': "fit",
'predict': "predict"
}
model = ModelLoader(LightGbmTrainer, model_params)
Run training
fit_params = {
'use_best_model': True,
'verbose': 100,
'plot': True
}
predict_params = {}
from sklearn.metrics import roc_auc_score
results = model.run(data_loader, roc_auc_score, fit_params,
predict_params, verbose=True)
Save results
current_file_path = os.path.abspath(__file__) # to save this .py file
model.save(data_loader, results, current_file_path, preds_folder, models_folder)
Where
models_folder
is a path where to save code sources to be able to reproduce your predictions later. In other words, it places your current_file_path
into models_folder
preds_folder
is a path to predictions folder (for future stacking/blending)
Contribution
Feel free to send your pull request if would like anything to be improved.
We also use GitHub issues to track requests and bugs.