
PruningCallback doesn't work

himkt opened this issue · 6 comments

himkt commented

Apart from #18

#18 (comment)

himkt commented


Thank you so much for diving into allennlp-optuna.

Which storage do you use? (should be one of sqlite3, MySQL, PostgreSQL, Redis)
And could you please share with me a simple reproducible configuration?

Thanks for creating the issue @himkt I use the default sqlite3 storage

local model_name = "models/distilroberta-base-msmarco-v1/0_Transformer";
local num_gpus = 8;
local data_base_url = "data/mydata/processed/";
local batch_size = std.parseInt(std.extVar('batch_size'));
local lr = std.parseJson(std.extVar('lr'));
local model = "my_model";
local dataset_reader = "my_reader";

  "train_data_path": data_base_url + "train.tsv.part*",
  "validation_data_path": data_base_url + "valid.tsv.part*",
  "dataset_reader": {
    "type": "sharded",
    "base_reader": {
      "type": dataset_reader,
      "query_tokenizer": {
        "type": "pretrained_transformer",
        "model_name": model_name,
        "max_length": 500,
      "query_token_indexers": {
        "tokens": {
          "type": "pretrained_transformer",
          "model_name": model_name,
          "namespace": "tokens"
  'model': {
    'type': model,
    'transformer_model': model_name,
  "data_loader": {
    "batch_size": batch_size,
    "shuffle": true
  "distributed": {
    "cuda_devices": if num_gpus > 1 then std.range(0, num_gpus - 1) else 0,
  "trainer": {
    "num_epochs": 10,
    "optimizer": {
      "type": "huggingface_adamw",
      "lr": lr,
      "betas": [0.9, 0.999],
      "eps": 1e-8,
      "correct_bias": true
    "learning_rate_scheduler": {
      "type": "polynomial_decay",
    "use_amp": true,
    "grad_norm": 1.0,
    "validation_metric": "+rec1",
    "epoch_callbacks": [
        "type": "optuna_pruner"

This was the config I was using. You would have to change the models and dataset readers, I can try to reproduce with a simpler example with predefined models etc, but it would take me a while since I won't be using the multi GPU cluster for some time.

himkt commented

@vikigenius Thank you for your help.

Let me ask a question: does this configuration work well if it runs on a single GPU? (means that it disables distributed).
The current implementation of AllenNLP integration for a pruning feature may not work with a distributed setting.

If your configuration works on a single GPU, I'll investigate AllenNLP integration in Optuna. But, it may take time because the mechanism for supporting PruningCallback in the integration is relatively complicated (I implemented...) and I don't have a cluster with multi GPUs now.

Sorry for the inconvenience. 🙇

himkt commented

Related to optuna/optuna#1990.

himkt commented

FYI @vikigenius

I'm working on the entirely refactoring AllenNLP integration in Optuna (optuna/optuna#2796).
After this PR being merged, PruningCallback would work with distributed training.

himkt commented

In the Optuna v3.0.0a0, we finally introduced the support for the pruning callback in distributed training.

pip install -U optuna==3.0.0a0