Question for reproducing the numbers of slot attention on CLEVR

Question

Question for reproducing the numbers of slot attention on CLEVR

kdwonn opened this issue a year ago · 3 comments

Hello. First of all, thank you for sharing valuable resources!

Thanks to this repo, I was able to easily kickstart research in a new field.

I have a question about reproducing the results on CLEVR10 using the provided .yaml training config.

When I run the experiment using the command python train_object_discovery model=slot-attention dataset=clevr ++num_workers=4, the ARI on the original dataset achieves ~ 0.66, which suggests that the result is not properly reproduced.

I am attaching the training config and evaluation results together (BTW, the logging system is so well organized!).

It would be really helpful if you could point out the part that may be causing the discrepancy. Thank you!

train_config.yaml

seed: 12345
device: cuda
debug: false
num_workers: 4
allow_resume: false
trainer:
  clip_grad_norm: null
  logweights_steps: 1000
  logimages_steps: 10000
  logloss_steps: 1000
  checkpoint_steps: 1000
  logvalid_steps: 25000
  resubmit_steps: null
  resubmit_hours: null
  _target_: models.slot_attention.trainer.SlotAttentionTrainer
  steps: 500000
  use_warmup_lr: true
  warmup_steps: 10000
  use_exp_decay: true
  exp_decay_rate: 0.5
  exp_decay_steps: 100000
  optimizer_config:
    alg: Adam
    lr: 0.0004
dataset:
  output_features: all
  skip_loading: false
  _target_: data.datasets.Clevr
  width: 128
  height: 128
  num_background_objects: 1
  max_num_objects: 11
  name: clevr
  input_channels: 3
  dataset_path: clevr_10-full.hdf5
  downstream_features:
  - x
  - 'y'
  - size
  - shape
  - material
  - color
data_sizes:
- 90000
- 5000
- 5000
model:
  height: ${dataset.height}
  width: ${dataset.width}
  _target_: models.slot_attention.model.SlotAttentionAE
  name: slot-attention
  num_slots: 7
  latent_size: 64
  encoder_params:
    channels:
    - 64
    - 64
    - 64
    - 64
    kernels:
    - 5
    - 5
    - 5
    - 5
    paddings:
    - 2
    - 2
    - 2
    - 2
    strides:
    - 1
    - 2
    - 2
    - 1
  decoder_params:
    conv_transposes: true
    channels:
    - 64
    - 64
    - 64
    - 64
    - 64
    - 4
    kernels:
    - 5
    - 5
    - 5
    - 5
    - 5
    - 3
    strides:
    - 2
    - 2
    - 2
    - 2
    - 1
    - 1
    paddings:
    - 2
    - 2
    - 2
    - 2
    - 2
    - 1
    output_paddings:
    - 1
    - 1
    - 1
    - 1
    - 0
    - 0
    activations:
    - relu
    - relu
    - relu
    - relu
    - relu
    - null
  attention_iters: 3
  mlp_size: 128
  eps: 1.0e-08
  h_broadcast: 8
  w_broadcast: 8
batch_size: 32
uuid: 1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb

results.json

[
  {
    "train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
    "eval_config": {
      "variant_type": "original",
      "checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
      "device": "cuda",
      "seed": 12345,
      "batch_size": 64,
      "dataset_size": null,
      "starting_index": null
    },
    "results": {
      "metric_name": "ari",
      "metric_value": 0.6597288250923157
    }
  },
  {
    "train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
    "eval_config": {
      "variant_type": "original",
      "checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
      "device": "cuda",
      "seed": 12345,
      "batch_size": 64,
      "dataset_size": null,
      "starting_index": null
    },
    "results": {
      "metric_name": "mean_segcover",
      "metric_value": 0.1793033480644226
    }
  },
  {
    "train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
    "eval_config": {
      "variant_type": "original",
      "checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
      "device": "cuda",
      "seed": 12345,
      "batch_size": 64,
      "dataset_size": null,
      "starting_index": null
    },
    "results": {
      "metric_name": "scaled_segcover",
      "metric_value": 0.24997809529304504
    }
  },
  {
    "train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
    "eval_config": {
      "variant_type": "original",
      "checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
      "device": "cuda",
      "seed": 12345,
      "batch_size": 64,
      "dataset_size": null,
      "starting_index": null
    },
    "results": {
      "metric_name": "mse",
      "metric_value": 0.000645454041659832
    }
  },
  {
    "train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
    "eval_config": {
      "variant_type": "original",
      "checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
      "device": "cuda",
      "seed": 12345,
      "batch_size": 64,
      "dataset_size": null,
      "starting_index": null
    },
    "results": {
      "metric_name": "mse_unmodified_fg",
      "metric_value": 0.00045366105041466653
    }
  },
  {
    "train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
    "eval_config": {
      "variant_type": "original",
      "checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
      "device": "cuda",
      "seed": 12345,
      "batch_size": 64,
      "dataset_size": null,
      "starting_index": null
    },
    "results": {
      "metric_name": "mse_fg",
      "metric_value": 0.00045366105041466653
    }
  }
]

Answer 1 · 2023-07-10T14:57:27.000Z

Also, please let me know if you plan to release your experiments' numerical results. The results in the paper are only presented in the bar graph format. It would be really helpful for users who want to compare their results with the previous work.

Answer 2 · 2023-07-11T12:00:31.000Z

The reason why results were not reproduces is that I missed +dataset.variant=6 in the command.

Anyway, thank again for sharing the valuable codes!

Answer 3 · 2023-07-13T08:03:00.000Z

Hi, thanks for the kind words!

Exactly, you have been training on CLEVR10 using only 7 slots, which is the default for CLEVR6. You found the right fix – alternatively, you could keep CLEVR10 but change the number of slots to 11.