finetune results seems not very stable
Opened this issue · 2 comments
For the first run, the Evaluation in step 200,400,800:
{'eval_loss': 0.6967583894729614, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918918919, 'eval_recall': 0.5, 'eval_runtime': 3.1287, 'eval_samples_per_second': 378.436, 'eval_steps_per_second': 23.652, 'epoch': 0.15}
{'eval_loss': 0.6942448019981384, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810810811, 'eval_recall': 0.5, 'eval_runtime': 3.0649, 'eval_samples_per_second': 386.309, 'eval_steps_per_second': 24.144, 'epoch': 0.3}
{'eval_loss': 0.6936447620391846, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918918919, 'eval_recall': 0.5, 'eval_runtime': 3.1124, 'eval_samples_per_second': 380.411, 'eval_steps_per_second': 23.776, 'epoch': 0.9}
In 3000 steps:
{'eval_loss': 0.6934958696365356, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810810811, 'eval_recall': 0.5, 'eval_runtime': 3.0942, 'eval_samples_per_second': 382.652, 'eval_steps_per_second': 23.916, 'epoch': 2.4}
Then run again,the Evaluation in step 200,400,800:
{'eval_loss': 0.6764485239982605, 'eval_accuracy': 0.543918918918919, 'eval_f1': 0.500248562558037, 'eval_matthews_correlation': 0.1047497035448165, 'eval_precision': 0.5646432374866879, 'eval_recall': 0.5424348347148706, 'eval_runtime': 3.1702, 'eval_samples_per_second': 373.483, 'eval_steps_per_second': 23.343, 'epoch': 0.15}
{'eval_loss': 0.6603909730911255, 'eval_accuracy': 0.7170608108108109, 'eval_f1': 0.7006777463594056, 'eval_matthews_correlation': 0.4870947362226591, 'eval_precision': 0.7747432713117492, 'eval_recall': 0.7158936240030817, 'eval_runtime': 3.0877, 'eval_samples_per_second': 383.453, 'eval_steps_per_second': 23.966, 'epoch': 0.45}
{'eval_loss': 0.3846745193004608, 'eval_accuracy': 0.8386824324324325, 'eval_f1': 0.8381642045361026, 'eval_matthews_correlation': 0.6827625873383905, 'eval_precision': 0.8437887048419396, 'eval_recall': 0.8389907406086374, 'eval_runtime': 3.088, 'eval_samples_per_second': 383.423, 'eval_steps_per_second': 23.964, 'epoch': 0.6}
use the default parameter,need I set a different LR?
对于第一次运行,步骤 200,400,800 中的 Evaluation (评估):
{'eval_loss': 0.6967583894729614, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918919, 'eval_recall': 0.5, 'eval_runtime': 3.1287, 'eval_samples_per_second': 378.436, 'eval_steps_per_second': 23.652, 'epoch': 0.15}
{'eval_loss': 0.6942448019981384, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810811, 'eval_recall': 0.5, 'eval_runtime': 3.0649, 'eval_samples_per_second': 386.309, 'eval_steps_per_second': 24.144, 'epoch': 0.3}
{'eval_loss': 0.6936447620391846, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918919, 'eval_recall': 0.5, 'eval_runtime': 3.1124, 'eval_samples_per_second': 380.411, 'eval_steps_per_second': 23.776, 'epoch': 0.9}
以 3000 步为单位: {'eval_loss': 0.6934958696365356, 'eval_accuracy': 0.49746621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810811, 'eval_recall': 0.5, 'eval_runtime': 3.0942, 'eval_samples_per_second': 382.652, 'eval_steps_per_second': 23.916, 'epoch': 2.4}
然后再次运行,步骤 200,400,800 中的 Evaluation 为:
{'eval_loss': 0.6764485239982605, 'eval_accuracy': 0.543918918918919, 'eval_f1': 0.500248562558037, 'eval_matthews_correlation': 0.1047497035448165, 'eval_precision': 0.5646432374866879, 'eval_recall': 0.5424348347148706, 'eval_runtime': 3.1702, 'eval_samples_per_second': 373.483, 'eval_steps_per_second': 23.343, 'epoch': 0.15}
{'eval_loss': 0.6603909730911255, 'eval_accuracy': 0.7170608108108109, 'eval_f1': 0.7006777463594056, 'eval_matthews_correlation': 0.4870947362226591, 'eval_precision': 0.7747432713117492, 'eval_recall': 0.7158936240030817, 'eval_runtime': 3.0877, 'eval_samples_per_second': 383.453, 'eval_steps_per_second': 23.966, 'epoch': 0.45}
{'eval_loss': 0.3846745193004608, 'eval_accuracy': 0.8386824324325, 'eval_f1': 0.8381642045361026, 'eval_matthews_correlation': 0.6827625873383905, 'eval_precision': 0.8437887048419396, 'eval_recall': 0.8389907406086374, 'eval_runtime': 3.088, 'eval_samples_per_second': 383.423, 'eval_steps_per_second': 23.964, 'epoch': 0.6}
使用默认参数,需要设置不同的 LR 吗?
Hello, has your issue been resolved? I'm not sure why the eval_loss keeps fluctuating around 0.69, and the eval_accuracy remains at 0.5.
{'loss': 0.6966, 'learning_rate': 2.9946319642130948e-05, 'epoch': 0.04}
0%|▎ | 100/24640 [00:25<1:38:50, 4.14it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6943994760513306, 'eval_accuracy': 0.5, 'eval_f1': 0.4395347531083056, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.4283, 'eval_samples_per_second': 1045.035, 'eval_steps_per_second': 32.667, 'epoch': 0.04}
{'loss': 0.6952, 'learning_rate': 2.9824318828792195e-05, 'epoch': 0.08}
1%|▋ | 200/24640 [01:02<1:44:08, 3.91it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6953960061073303, 'eval_accuracy': 0.5, 'eval_f1': 0.47613731222845423, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 13.3936, 'eval_samples_per_second': 969.717, 'eval_steps_per_second': 30.313, 'epoch': 0.08}
{'loss': 0.695, 'learning_rate': 2.9704758031720214e-05, 'epoch': 0.12}
1%|▉ | 300/24640 [01:40<1:35:35, 4.24it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6941527724266052, 'eval_accuracy': 0.5, 'eval_f1': 0.48044077783907113, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.8027, 'eval_samples_per_second': 1014.474, 'eval_steps_per_second': 31.712, 'epoch': 0.12}
{'loss': 0.6973, 'learning_rate': 2.9582757218381458e-05, 'epoch': 0.16}
2%|█▎ | 400/24640 [02:17<1:37:01, 4.16it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6983540654182434, 'eval_accuracy': 0.5, 'eval_f1': 0.47727597702653923, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.4288, 'eval_samples_per_second': 1044.996, 'eval_steps_per_second': 32.666, 'epoch': 0.16}
{'loss': 0.6967, 'learning_rate': 2.9460756405042703e-05, 'epoch': 0.2}
2%|█▋ | 500/24640 [02:55<1:42:51, 3.91it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.693518877029419, 'eval_accuracy': 0.5, 'eval_f1': 0.4317208432763909, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.349, 'eval_samples_per_second': 1051.749, 'eval_steps_per_second': 32.877, 'epoch': 0.2}
{'loss': 0.6949, 'learning_rate': 2.9338755591703947e-05, 'epoch': 0.24}
2%|█▉ | 600/24640 [03:32<1:44:46, 3.82it/s]***** Running Evaluation *****
Which datasets are you using? It is possible that the model fails to converge with some random seeds and hyperparameters. We observe same phenomenon on the COVID dataset before.