Can not reproduce the results of freeMatch

Question

Can not reproduce the results of freeMatch

skingorz opened this issue 8 months ago · 9 comments

I try to reproduce freeMatch in Cifar100 using the following command:

python train.py --c config/usb_cv/freematch/freematch_cifar100_400_0.yaml

Compared to the original config. I just make the next change:

multiprocessing_distributed: False
gpu: 0

But the error rate I reproduced is 0.1754, which is higher then 15.65±0.26 reported in this line. Although I only tested the result where seed=0, the gap is quite large. Is there anything I haven't noticed.

Answer 1 · 2023-12-24T01:48:49.000Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Answer 2 · 2023-12-27T23:35:54.000Z

Hi, can you check the hyper-parameters in this file: https://drive.google.com/drive/folders/1oON5Vyjvb3vWxOQl7hdUl-eh0K-TEPPS.

The config file is slightly different from what we use for reporting the results.

Answer 3 · 2024-01-03T18:19:27.000Z

Thanks for your response. I have checked the hyper-paprameter in my log. I found that clip_thresh was set to False in my log, which do not contain this arguments. However, I donot find clip_thresh in the config file. The following is my log

[2023-11-24 00:13:36,410 INFO] Use GPU: 0 for training
[2023-11-24 00:13:38,240 INFO] unlabeled data number: 50000, labeled data number 400
[2023-11-24 00:13:39,201 INFO] Create train and test data loaders
[2023-11-24 00:13:39,202 INFO] [!] data loader keys: dict_keys(['train_lb', 'train_ulb', 'eval'])
[2023-11-24 00:13:40,203 INFO] Create optimizer and scheduler
[2023-11-24 00:13:40,211 INFO] Number of Trainable Params: 21436900
[2023-11-24 00:13:40,338 INFO] Arguments: Namespace(T=0.5, algorithm='freematch', amp=False, batch_size=8, c='config/usb_cv/freematch/freematch_cifar100_400_0.yaml', clip=0.0, clip_grad=0, clip_thresh=False, crop_ratio=0.875, data_dir='./data', dataset='cifar100', dist_backend='nccl', dist_url='tcp://127.0.0.1:26868', distributed=True, ema_m=0.0, ema_p=0.999, ent_loss_ratio=0.001, epoch=200, eval_batch_size=16, gpu=0, hard_label=True, imb_algorithm=None, img_size=32, include_lb_to_ulb=True, layer_decay=0.5, lb_dest_len=400, lb_imb_ratio=1, load_path='./saved_models/usb_cv//freematch_cifar100_400_0/latest_model.pth', lr=0.0005, max_length=512, max_length_seconds=4.0, momentum=0.9, multiprocessing_distributed=True, net='vit_small_patch2_32', net_from_name=False, num_classes=100, num_eval_iter=2048, num_labels=400, num_log_iter=256, num_train_iter=204800, num_warmup_iter=0, num_workers=4, optim='AdamW', overwrite=True, pretrain_path='https://github.com/microsoft/Semi-supervised-learning/releases/download/v.0.0.0/vit_small_patch2_32_mlp_im_1k_32.pth', rank=0, resume=True, sample_rate=16000, save_dir='./saved_models/usb_cv/', save_name='freematch_cifar100_400_0', seed=0, train_sampler='RandomSampler', ulb_dest_len=50000, ulb_imb_ratio=1, ulb_loss_ratio=1.0, ulb_num_labels=None, uratio=1, use_aim=False, use_cat=True, use_pretrain=True, use_quantile=False, use_tensorboard=True, use_wandb=False, weight_decay=0.0005, world_size=1)
[2023-11-24 00:13:40,339 INFO] Resume load path ./saved_models/usb_cv//freematch_cifar100_400_0/latest_model.pth does not exist
[2023-11-24 00:13:40,339 INFO] Model training
[2023-11-24 00:22:55,180 INFO] 256 iteration USE_EMA: False, train/sup_loss: 1.8427, train/unsup_loss: 1.5923, train/total_loss: 3.4083, train/util_ratio: 1.0000, train/run_time: 0.0531, lr: 0.0005, train/prefetch_time: 0.0025

What's more, how many GPUs were used during the training process for this log? Will the number of GPUs have a significant impact on performance？

Answer 4 · 2024-01-16T02:30:44.000Z

my result is 82.08,same as you ,but the results in paper the error rate is 36% the gap is too large what is the problem.

Answer 5 · 2024-01-16T03:16:44.000Z

error rate is 38%

Answer 6 · 2024-01-16T03:50:33.000Z

do you use pretrained vit? the results of paper are obtained using torchssl, it trained wide resnet from scratch.

Answer 7 · 2024-01-16T14:44:57.000Z

error rate is 38%

Which config file are you using?

Answer 8 · 2024-01-25T09:50:56.000Z

已经解决了

Answer 9 · 2024-02-13T12:18:16.000Z

close as no one continues discussing, will re-open if it still has issues.