sberbank-ai-lab/LightAutoML

reproducibility

Ulitochka opened this issue · 3 comments

Hello everyone.

While using the library, we encountered a couple of cases in which we get different predicates on the same data with the same model configuration.

The first case
We train the model with these settings:

roles = {'target': 'label', 'text': ['text']}
task = Task('binary', metric='auc')

automl = TabularNLPAutoML(task=task,
    timeout=100000,
    general_params={'use_algos': ['nn', 'cb', 'lgb', 'linear_l2']},
    gpu_ids='0',
    reader_params={'n_jobs': 12},
    cpu_limit=13,
    text_params={'lang': 'ru'},

    nn_params={
        'lang': 'ru',
        'snap_params': {'k': 1, 'early_stopping': True, 'patience': 1, 'swa': False},
        'max_length': 256,
        'bs': 16,
        'bert_name': 'DeepPavlov/rubert-base-cased-conversational',
        'pooling': 'cls' },

    nn_pipeline_params={'text_features': 'bert'},
    autonlp_params={'model_name': 'random_lstm_bert'},
    gbm_pipeline_params={'text_features': 'embed'}, # tfidf embed
    linear_pipeline_params={'text_features': 'embed'},
    verbose=2
)

We predict on the test data and get the result # 1:

def to_labels(pos_probs, threshold):
    return (pos_probs >= threshold). astype ('int`)'

test_pred = automl.predict(test_pd)
labels = to_labels(test_pred.data[:, 0], 0.5)
print(classification_report(test_pd[roles['target']].values, labels, digits=4))

We repeat the training again, we get result # 2 on the test data, while the result is # 1 != result # 2

Please tell me what this behavior can be related to?

Hi @Ulitochka,

To figure out why the results are not equal, could you please share the training logs of both models?

Alex

@Ulitochka do you train the model on the same machine in both cases?

Stale issue message