Question for reproducing the numbers of slot attention on CLEVR
kdwonn opened this issue · 3 comments
Hello. First of all, thank you for sharing valuable resources!
Thanks to this repo, I was able to easily kickstart research in a new field.
I have a question about reproducing the results on CLEVR10 using the provided .yaml training config.
When I run the experiment using the command python train_object_discovery model=slot-attention dataset=clevr ++num_workers=4
, the ARI on the original dataset achieves ~ 0.66, which suggests that the result is not properly reproduced.
I am attaching the training config and evaluation results together (BTW, the logging system is so well organized!).
It would be really helpful if you could point out the part that may be causing the discrepancy. Thank you!
train_config.yaml
seed: 12345
device: cuda
debug: false
num_workers: 4
allow_resume: false
trainer:
clip_grad_norm: null
logweights_steps: 1000
logimages_steps: 10000
logloss_steps: 1000
checkpoint_steps: 1000
logvalid_steps: 25000
resubmit_steps: null
resubmit_hours: null
_target_: models.slot_attention.trainer.SlotAttentionTrainer
steps: 500000
use_warmup_lr: true
warmup_steps: 10000
use_exp_decay: true
exp_decay_rate: 0.5
exp_decay_steps: 100000
optimizer_config:
alg: Adam
lr: 0.0004
dataset:
output_features: all
skip_loading: false
_target_: data.datasets.Clevr
width: 128
height: 128
num_background_objects: 1
max_num_objects: 11
name: clevr
input_channels: 3
dataset_path: clevr_10-full.hdf5
downstream_features:
- x
- 'y'
- size
- shape
- material
- color
data_sizes:
- 90000
- 5000
- 5000
model:
height: ${dataset.height}
width: ${dataset.width}
_target_: models.slot_attention.model.SlotAttentionAE
name: slot-attention
num_slots: 7
latent_size: 64
encoder_params:
channels:
- 64
- 64
- 64
- 64
kernels:
- 5
- 5
- 5
- 5
paddings:
- 2
- 2
- 2
- 2
strides:
- 1
- 2
- 2
- 1
decoder_params:
conv_transposes: true
channels:
- 64
- 64
- 64
- 64
- 64
- 4
kernels:
- 5
- 5
- 5
- 5
- 5
- 3
strides:
- 2
- 2
- 2
- 2
- 1
- 1
paddings:
- 2
- 2
- 2
- 2
- 2
- 1
output_paddings:
- 1
- 1
- 1
- 1
- 0
- 0
activations:
- relu
- relu
- relu
- relu
- relu
- null
attention_iters: 3
mlp_size: 128
eps: 1.0e-08
h_broadcast: 8
w_broadcast: 8
batch_size: 32
uuid: 1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb
results.json
[
{
"train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
"eval_config": {
"variant_type": "original",
"checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
"device": "cuda",
"seed": 12345,
"batch_size": 64,
"dataset_size": null,
"starting_index": null
},
"results": {
"metric_name": "ari",
"metric_value": 0.6597288250923157
}
},
{
"train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
"eval_config": {
"variant_type": "original",
"checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
"device": "cuda",
"seed": 12345,
"batch_size": 64,
"dataset_size": null,
"starting_index": null
},
"results": {
"metric_name": "mean_segcover",
"metric_value": 0.1793033480644226
}
},
{
"train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
"eval_config": {
"variant_type": "original",
"checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
"device": "cuda",
"seed": 12345,
"batch_size": 64,
"dataset_size": null,
"starting_index": null
},
"results": {
"metric_name": "scaled_segcover",
"metric_value": 0.24997809529304504
}
},
{
"train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
"eval_config": {
"variant_type": "original",
"checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
"device": "cuda",
"seed": 12345,
"batch_size": 64,
"dataset_size": null,
"starting_index": null
},
"results": {
"metric_name": "mse",
"metric_value": 0.000645454041659832
}
},
{
"train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
"eval_config": {
"variant_type": "original",
"checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
"device": "cuda",
"seed": 12345,
"batch_size": 64,
"dataset_size": null,
"starting_index": null
},
"results": {
"metric_name": "mse_unmodified_fg",
"metric_value": 0.00045366105041466653
}
},
{
"train_config.uuid": "1c8ed864-0352-4b1a-9812-7bb2e9b0e8cb",
"eval_config": {
"variant_type": "original",
"checkpoint_path": "/data02/dongwon/object-centric-library/outputs/runs/slot-attention-clevr-2023-07-07_20-14-41",
"device": "cuda",
"seed": 12345,
"batch_size": 64,
"dataset_size": null,
"starting_index": null
},
"results": {
"metric_name": "mse_fg",
"metric_value": 0.00045366105041466653
}
}
]
Also, please let me know if you plan to release your experiments' numerical results. The results in the paper are only presented in the bar graph format. It would be really helpful for users who want to compare their results with the previous work.
The reason why results were not reproduces is that I missed +dataset.variant=6
in the command.
Anyway, thank again for sharing the valuable codes!
Hi, thanks for the kind words!
Exactly, you have been training on CLEVR10 using only 7 slots, which is the default for CLEVR6. You found the right fix – alternatively, you could keep CLEVR10 but change the number of slots to 11.