transformers.Trainer `load_best_model_at_end` doesn't work

Question

transformers.Trainer `load_best_model_at_end` doesn't work

Opened this issue 4 months ago · 0 comments

michael-wang-enigma commented 4 months ago

Here's my code for, which matches what I saw from the examples/finetune.ipynb in this repo, but adding the callback to early stop and load best model.

    training_args = TrainingArguments(
        output_dir=f"models/{model_name}",
        learning_rate=2e-5,
        weight_decay=0.02,
        others_lr=1e-5,
        seed=42,
        data_seed=42,
        others_weight_decay=0.01,
        lr_scheduler_type="linear", #cosine
        warmup_ratio=0.1,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=,
        num_train_epochs=3,
        eval_strategy="steps",
        eval_steps=10,
        save_total_limit=2,
        dataloader_num_workers=8,
        load_best_model_at_end=True,
    )
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=data_train_gliner,
        eval_dataset=data_test_gliner,
        tokenizer=model.data_processor.transformer_tokenizer,
        data_collator=data_collator,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=5)]
    )
   trainer.train()

At the end of the this, trainer.model will return the model at the last epoch instead of the model with the best eval_loss score. Note that I also had to set a private attribute in the model and set the torch seed because those weren't working either:

torch.manual_seed(42) # otherwise training would give me random results each time
model_name = "knowledgator/gliner-multitask-large-v0.5"
model = GLiNER.from_pretrained(model_name)
model._keys_to_ignore_on_save = None # otherwise training would error saying this attribute was not found

UPDATE: even after setting the seed, I am getting a different eval_loss each time I do finetuning.